LLM environments¶
Gymnasium-style environments for supervised fine-tuning, preference optimization,
and reasoning RL. These types are also re-exported from
agilerl.utils.llm_utils for backwards compatibility.
- class agilerl.llm_envs.HuggingFaceGym(train_dataset: Dataset, test_dataset: Dataset, tokenizer: AutoTokenizer, conversation_template: list[dict[str, str]] | None, data_batch_size_per_gpu: int = 8, max_context_length: int | None = None, min_completion_length: int | None = None, accelerator: Accelerator | None = None, seed: int = 42)¶
Abstract base class for HuggingFace Gymnasium environments.
- class agilerl.llm_envs.SFTGym(train_dataset: Dataset, test_dataset: Dataset, tokenizer: AutoTokenizer, data_batch_size_per_gpu: int = 8, response_column: str = 'target', accelerator: Accelerator | None = None, max_context_length: int | None = None, seed: int = 42)¶
Gymnasium-style environment for supervised fine-tuning (SFT) datasets.
- class agilerl.llm_envs.PreferenceGym(train_dataset: Dataset, test_dataset: Dataset, tokenizer: AutoTokenizer, data_batch_size_per_gpu: int = 8, accelerator: Accelerator | None = None, max_context_length: int | None = None, min_completion_length: int | None = None, seed: int = 42)¶
Class to convert HuggingFace preference datasets into Gymnasium style environment.
- class agilerl.llm_envs.ReasoningGym(train_dataset: Dataset, test_dataset: Dataset, tokenizer: AutoTokenizer, reward_fn: Callable[[str, str, str], float], conversation_template: list[dict[str, str]], data_batch_size_per_gpu: int = 8, accelerator: Accelerator | None = None, return_raw_completions: bool = False, max_context_length: int | None = None, seed: int = 42)¶
Class to convert HuggingFace datasets into Gymnasium style environment.