General Utils¶
- agilerl.utils.utils.make_vect_envs(env_name: str | None = None, num_envs: int = 1, *, make_env: Callable[[...], Any] | None = None, should_async_vector: bool = True, **env_kwargs: Any) Any¶
Return async-vectorized gym environments.
- Parameters:
env_name (str) – Gym environment name
num_envs (int, optional) – Number of vectorized environments, defaults to 1
make_env (Callable, optional) – Function that creates a gym environment, defaults use gym.make(env_name)
should_async_vector (bool, optional) – Whether to asynchronous vectorized environments, defaults to True
- agilerl.utils.utils.make_multi_agent_vect_envs(env: Callable[[], ParallelEnv], num_envs: int = 1, **env_kwargs: Any) AsyncPettingZooVecEnv¶
Return async-vectorized PettingZoo parallel environments.
- Parameters:
env (pettingzoo.utils.env.ParallelEnv) – PettingZoo parallel environment object
num_envs (int, optional) – Number of vectorized environments, defaults to 1
- Returns:
Async-vectorized PettingZoo parallel environments
- Return type:
- agilerl.utils.utils.make_skill_vect_envs(env_name: str, skill: Any, num_envs: int = 1) AsyncVectorEnv¶
Return async-vectorized gym environments.
- Parameters:
env_name (str) – Gym environment name
skill (agilerl.wrappers.learning.Skill) – Skill wrapper to apply to environment
num_envs (int, optional) – Number of vectorized environments, defaults to 1
- agilerl.utils.utils.observation_space_channels_to_first(observation_space: Box | Discrete | Dict | Tuple) Box | Discrete | Dict | Tuple¶
Swap the channel order of an observation space from [H, W, C] -> [C, H, W].
- Parameters:
observation_space (spaces.Box, spaces.Dict, spaces.Tuple, spaces.Discrete) – Observation space
- Returns:
Observation space with swapped channels
- Return type:
spaces.Box, spaces.Dict, spaces.Tuple, spaces.Discrete
- agilerl.utils.utils.create_population(algo: str, net_config: dict[str, Any] | None, INIT_HP: dict[str, Any], observation_space: Box | Discrete | MultiDiscrete | Dict | Tuple | MultiBinary | list[Box | Discrete | MultiDiscrete | Dict | Tuple | MultiBinary] | None = None, action_space: Box | Discrete | MultiDiscrete | Dict | Tuple | MultiBinary | list[Box | Discrete | MultiDiscrete | Dict | Tuple | MultiBinary] | None = None, hp_config: HyperparameterConfig | None = None, actor_network: EvolvableModule | None = None, critic_network: EvolvableModule | None = None, agent_wrapper: Callable | None = None, wrapper_kwargs: dict[str, Any] | None = None, population_size: int = 1, num_envs: int = 1, device: str = 'cpu', accelerator: Any | None = None, torch_compiler: Any | None = None, tokenizer: Any | None = None, model_name: str | None = None, lora_config: Any | None = None, vllm_config: Any | None = None, algo_kwargs: dict[str, Any] | None = None) list[EvolvableAlgorithmProtocol]¶
Return population of identical agents.
- Parameters:
algo (str) – RL algorithm
net_config (dict or None) – Network configuration
INIT_HP (dict) – Initial hyperparameters
observation_space (spaces.Space) – Observation space
action_space (spaces.Space) – Action space
hp_config (HyperparameterConfig, optional) – Choice of algorithm hyperparameters to mutate during training, defaults to None
actor_network (nn.Module, optional) – Custom actor network, defaults to None
critic_network (nn.Module, optional) – Custom critic network, defaults to None
population_size (int, optional) – Number of agents in population, defaults to 1
num_envs (int, optional) – Number of vectorized environments, defaults to 1
device (str, optional) – Device for accelerated computing, ‘cpu’ or ‘cuda’, defaults to ‘cpu’
accelerator (accelerate.Accelerator(), optional) – Accelerator for distributed computing, defaults to None
torch_compiler (Any, optional) – Torch compiler, defaults to None
tokenizer (Any, optional) – Hugging Face tokenizer; used to default
pad_token_id/pad_tokenfor GRPO / DPO / LLMPPO / LLMREINFORCE when not set inalgo_kwargs.model_name (str, optional) – HF model id or path; defaults
algo_kwargs['model_name']orINIT_HP['MODEL_NAME']for LLM agents.lora_config (Any, optional) –
peft.LoraConfig; if omitted, built fromLORA_*/TARGET_MODULESkeys inINIT_HPwhen present.vllm_config (Any, optional) –
VLLMConfigfor GRPO / LLMPPO / LLMREINFORCE (ignored for DPO).algo_kwargs (dict, optional) – Additional keyword arguments for the algorithm
- Returns:
Population of agents
- Return type:
- Returns:
Population of agents
- Return type:
- agilerl.utils.utils.save_population_checkpoint(population: list[EvolvableAlgorithmProtocol], save_path: str, overwrite_checkpoints: bool, accelerator: Accelerator | None = None) None¶
Save checkpoint of population of agents.
- agilerl.utils.utils.save_llm_checkpoint(agent: LLMAlgorithm, checkpoint_path: str | None) None¶
Checkpoint the LLM, saving LoRA adapter weights via HuggingFace
save_pretrained.The saved directory can be reloaded with:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("<base-model-name>") model = PeftModel.from_pretrained(base_model, "<checkpoint_path>/actor/")
- Parameters:
agent (LLMAlgorithm) – Agent
checkpoint_path (str) – Checkpoint path — used as-is (no algo sub-directory is appended). Defaults to
"./saved_checkpoints"whenNone.
- agilerl.utils.utils.tournament_selection_and_mutation(population: list[EvolvableAlgorithmProtocol], tournament: TournamentSelection, mutation: Mutations, env_name: str, algo: str | None = None, elite_path: str | None = None, save_elite: bool = False, accelerator: Accelerator | None = None, language_model: bool | None = False) list[EvolvableAlgorithmProtocol]¶
Perform tournament selection and mutation on a population of agents.
- Parameters:
population (list[PopulationType]) – Population of agents
tournament (TournamentSelection) – Tournament selection object
mutation (Mutations) – Mutation object
env_name (str) – Environment name
elite_path (str, optional) – Path to save elite agent, defaults to None
save_elite (bool, optional) – Flag to save elite agent, defaults to False
accelerator (accelerate.Accelerator(), optional) – Accelerator for distributed computing, defaults to None
language_model (bool, optional) – Flag to indicate if the environment is a language model, defaults to False
- Returns:
Population of agents after tournament selection and mutation
- Return type:
list[PopulationType]
- agilerl.utils.utils.init_wandb(algo: str, env_name: str, init_hyperparams: dict[str, Any] | None = None, mutation_hyperparams: dict[str, Any] | None = None, wandb_api_key: str | None = None, accelerator: Accelerator | None = None, project: str = 'AgileRL', addl_args: dict[str, Any] | None = None) None¶
Initialize wandb for logging hyperparameters and run metadata.
- Parameters:
algo (str) – RL algorithm
env_name (str) – Environment name
init_hyperparams (dict, optional) – Initial hyperparameters, defaults to None
mutation_hyperparams (dict, optional) – Mutation hyperparameters, defaults to None
wandb_api_key (str, optional) – Wandb API key, defaults to None
accelerator – Accelerator for distributed computing, defaults to None
addl_args (dict, optional) – Additional kwargs to pass to wandb.init()
- agilerl.utils.utils.calculate_vectorized_scores(rewards: ndarray, terminations: ndarray, include_unterminated: bool = False, only_first_episode: bool = True) list[float]¶
Calculate the vectorized scores for episodes based on rewards and terminations.
- Parameters:
rewards (np.ndarray) – Array of rewards for each environment.
terminations (np.ndarray) – Array indicating termination points for each environment.
include_unterminated (bool, optional) – Whether to include rewards from unterminated episodes, defaults to False.
only_first_episode (bool, optional) – Whether to consider only the first episode, defaults to True.
- Returns:
List of episode rewards.
- Return type:
- agilerl.utils.utils.print_hyperparams(pop: list[EvolvableAlgorithmProtocol]) None¶
Print current hyperparameters of agents in a population and their fitnesses.
- Parameters:
pop (list[EvolvableAlgorithm]) – Population of agents
- agilerl.utils.utils.plot_population_score(pop: list[EvolvableAlgorithmProtocol]) None¶
Plot the fitness scores of agents in a population.
- Parameters:
pop (list[EvolvableAlgorithm]) – Population of agents
- agilerl.utils.utils.get_env_defined_actions(info: dict[str, Any], agents: list[str]) dict[str, Any] | None¶
Get the environment-defined actions for a list of agents.