General Utils

agilerl.utils.utils.make_vect_envs(env_name: str | None = None, num_envs=1, *, make_env: Callable | None = None, should_async_vector: bool = True, **env_kwargs)

Returns async-vectorized gym environments.

Parameters:
  • env_name (str) – Gym environment name

  • num_envs (int, optional) – Number of vectorized environments, defaults to 1

  • make_env (Callable, optional) – Function that creates a gym environment, defaults use gym.make(env_name)

  • should_async_vector (bool, optional) – Whether to asynchronous vectorized environments, defaults to True

agilerl.utils.utils.make_multi_agent_vect_envs(env: Callable[[], ParallelEnv], num_envs: int = 1, **env_kwargs: Any) AsyncPettingZooVecEnv

Returns async-vectorized PettingZoo parallel environments.

Parameters:
  • env (pettingzoo.utils.env.ParallelEnv) – PettingZoo parallel environment object

  • num_envs (int, optional) – Number of vectorized environments, defaults to 1

agilerl.utils.utils.make_skill_vect_envs(env_name: str, skill: Any, num_envs: int = 1) AsyncVectorEnv

Returns async-vectorized gym environments.

Parameters:
  • env_name (str) – Gym environment name

  • skill (agilerl.wrappers.learning.Skill) – Skill wrapper to apply to environment

  • num_envs (int, optional) – Number of vectorized environments, defaults to 1

agilerl.utils.utils.observation_space_channels_to_first(observation_space: Box | Discrete | Dict | Tuple) Box | Discrete | Dict | Tuple

Swaps the channel order of an observation space from [H, W, C] -> [C, H, W].

Parameters:

observation_space (spaces.Box, spaces.Dict, spaces.Tuple, spaces.Discrete) – Observation space

Returns:

Observation space with swapped channels

Return type:

spaces.Box, spaces.Dict, spaces.Tuple, spaces.Discrete

agilerl.utils.utils.create_population(algo: str, observation_space: Box | Discrete | MultiDiscrete | Dict | Tuple | MultiBinary | List[Box | Discrete | MultiDiscrete | Dict | Tuple | MultiBinary], action_space: Box | Discrete | MultiDiscrete | Dict | Tuple | MultiBinary | List[Box | Discrete | MultiDiscrete | Dict | Tuple | MultiBinary], net_config: Dict[str, Any] | None, INIT_HP: Dict[str, Any], hp_config: HyperparameterConfig | None = None, actor_network: EvolvableModule | None = None, critic_network: EvolvableModule | None = None, agent_wrapper: Callable | None = None, wrapper_kwargs: Dict[str, Any] | None = None, population_size: int = 1, num_envs: int = 1, device: str = 'cpu', accelerator: Any | None = None, torch_compiler: Any | None = None) List[EvolvableAlgorithm]

Returns population of identical agents.

Parameters:
  • algo (str) – RL algorithm

  • observation_space (spaces.Space) – Observation space

  • action_space (spaces.Space) – Action space

  • net_config (dict or None) – Network configuration

  • INIT_HP (dict) – Initial hyperparameters

  • hp_config (HyperparameterConfig, optional) – Choice of algorithm hyperparameters to mutate during training, defaults to None

  • actor_network (nn.Module, optional) – Custom actor network, defaults to None

  • critic_network (nn.Module, optional) – Custom critic network, defaults to None

  • population_size (int, optional) – Number of agents in population, defaults to 1

  • num_envs (int, optional) – Number of vectorized environments, defaults to 1

  • device (str, optional) – Device for accelerated computing, ‘cpu’ or ‘cuda’, defaults to ‘cpu’

  • accelerator (accelerate.Accelerator(), optional) – Accelerator for distributed computing, defaults to None

  • torch_compiler (Any, optional) – Torch compiler, defaults to None

Returns:

Population of agents

Return type:

list[EvolvableAlgorithm]

agilerl.utils.utils.save_population_checkpoint(population: List[EvolvableAlgorithm], save_path: str, overwrite_checkpoints: bool, accelerator: Accelerator | None = None) None

Saves checkpoint of population of agents.

Parameters:
  • population (list[PopulationType]) – Population of agents

  • save_path (str) – Path to save checkpoint

  • overwrite_checkpoints (bool) – Flag to overwrite checkpoints

  • accelerator (accelerate.Accelerator(), optional) – Accelerator for distributed computing, defaults to None

agilerl.utils.utils.tournament_selection_and_mutation(population: List[EvolvableAlgorithm], tournament: TournamentSelection, mutation: Mutations, env_name: str, algo: str | None = None, elite_path: str | None = None, save_elite: bool = False, accelerator: Accelerator | None = None, language_model: bool | None = False) List[EvolvableAlgorithm]

Performs tournament selection and mutation on a population of agents.

Parameters:
  • population (list[PopulationType]) – Population of agents

  • tournament (TournamentSelection) – Tournament selection object

  • mutation (Mutations) – Mutation object

  • env_name (str) – Environment name

  • elite_path (str, optional) – Path to save elite agent, defaults to None

  • save_elite (bool, optional) – Flag to save elite agent, defaults to False

  • accelerator (accelerate.Accelerator(), optional) – Accelerator for distributed computing, defaults to None

  • language_model (bool, optional) – Flag to indicate if the environment is a language model, defaults to False

Returns:

Population of agents after tournament selection and mutation

Return type:

list[PopulationType]

agilerl.utils.utils.init_wandb(algo: str, env_name: str, init_hyperparams: Dict[str, Any] | None = None, mutation_hyperparams: Dict[str, Any] | None = None, wandb_api_key: str | None = None, accelerator: Accelerator | None = None, project: str = 'AgileRL') None

Initializes wandb for logging hyperparameters and run metadata.

Parameters:
  • algo (str) – RL algorithm

  • env_name (str) – Environment name

  • init_hyperparams (dict, optional) – Initial hyperparameters, defaults to None

  • mutation_hyperparams (dict, optional) – Mutation hyperparameters, defaults to None

  • wandb_api_key (str, optional) – Wandb API key, defaults to None

  • accelerator – Accelerator for distributed computing, defaults to None

agilerl.utils.utils.calculate_vectorized_scores(rewards: ndarray, terminations: ndarray, include_unterminated: bool = False, only_first_episode: bool = True) List[float]

Calculate the vectorized scores for episodes based on rewards and terminations.

Parameters:
  • rewards (np.ndarray) – Array of rewards for each environment.

  • terminations (np.ndarray) – Array indicating termination points for each environment.

  • include_unterminated (bool, optional) – Whether to include rewards from unterminated episodes, defaults to False.

  • only_first_episode (bool, optional) – Whether to consider only the first episode, defaults to True.

Returns:

List of episode rewards.

Return type:

list[float]

agilerl.utils.utils.print_hyperparams(pop: List[EvolvableAlgorithm]) None

Prints current hyperparameters of agents in a population and their fitnesses.

Parameters:

pop (list[EvolvableAlgorithm]) – Population of agents

agilerl.utils.utils.plot_population_score(pop: List[EvolvableAlgorithm]) None

Plots the fitness scores of agents in a population.

Parameters:

pop (list[EvolvableAlgorithm]) – Population of agents