LLM environments

Gymnasium-style environments for supervised fine-tuning, preference optimization, and reasoning RL. These types are also re-exported from agilerl.utils.llm_utils for backwards compatibility.

class agilerl.llm_envs.HuggingFaceGym(train_dataset: Dataset, test_dataset: Dataset, tokenizer: AutoTokenizer, conversation_template: list[dict[str, str]] | None, data_batch_size_per_gpu: int = 8, max_context_length: int | None = None, min_completion_length: int | None = None, accelerator: Accelerator | None = None, seed: int = 42)

Abstract base class for HuggingFace Gymnasium environments.

class agilerl.llm_envs.SFTGym(train_dataset: Dataset, test_dataset: Dataset, tokenizer: AutoTokenizer, data_batch_size_per_gpu: int = 8, response_column: str = 'target', accelerator: Accelerator | None = None, max_context_length: int | None = None, seed: int = 42)

Gymnasium-style environment for supervised fine-tuning (SFT) datasets.

class agilerl.llm_envs.PreferenceGym(train_dataset: Dataset, test_dataset: Dataset, tokenizer: AutoTokenizer, data_batch_size_per_gpu: int = 8, accelerator: Accelerator | None = None, max_context_length: int | None = None, min_completion_length: int | None = None, seed: int = 42)

Class to convert HuggingFace preference datasets into Gymnasium style environment.

class agilerl.llm_envs.ReasoningGym(train_dataset: Dataset, test_dataset: Dataset, tokenizer: AutoTokenizer, reward_fn: Callable[[str, str, str], float], conversation_template: list[dict[str, str]], data_batch_size_per_gpu: int = 8, accelerator: Accelerator | None = None, return_raw_completions: bool = False, max_context_length: int | None = None, seed: int = 42)

Class to convert HuggingFace datasets into Gymnasium style environment.

agilerl.llm_envs.apply_chat_template(conversation_template: list[dict[str, str]], question: str, answer: str, tokenizer: AutoTokenizer) BatchEncoding

Create and tokenize a chat template for a reasoning task.

Parameters:
  • conversation_template (list[dict[str, str]]) – The conversation template to be tokenized.

  • question (str) – The question to be tokenized.

  • answer (str) – The answer to be tokenized.

  • tokenizer (AutoTokenizer) – The tokenizer to be used.

Returns:

The tokenized prompt.

Return type:

BatchEncoding