LLM Utils

class agilerl.utils.llm_utils.HuggingFaceGym(train_dataset: str, test_dataset, tokenizer: AutoTokenizer, reward_fn: Callable[[str, str, str], float], apply_chat_template_fn: Callable[[str, str, AutoTokenizer], BatchEncoding], max_answer_tokens: int = 512, data_batch_size: int = 8, custom_collate_fn: Callable | None = None)

Class to convert HuggingFace datasets into Gymnasium style environment.

Parameters:
  • dataset_name (str) – Dataset name to be loaded from HuggingFace datasets.

  • tokenizer (AutoTokenizer) – Tokenizer to be used for encoding and decoding the prompts.

  • reward_fn (Callable[..., float]) – Reward function for evaluating completions.

  • max_answer_tokens (int, optional) – Max number of answer tokens, defaults to 512

  • data_batch_size (int, optional) – DataLoader batch size, defaults to 8