General Utils¶
- agilerl.utils.utils.make_vect_envs(env_name: str | None = None, num_envs=1, *, make_env: Callable | None = None, should_async_vector: bool = True, **env_kwargs)¶
Returns async-vectorized gym environments.
- Parameters:
env_name (str) – Gym environment name
num_envs (int, optional) – Number of vectorized environments, defaults to 1
make_env (Callable, optional) – Function that creates a gym environment, defaults use gym.make(env_name)
should_async_vector (bool, optional) – Whether to asynchronous vectorized environments, defaults to True
- agilerl.utils.utils.make_multi_agent_vect_envs(env: Callable[[], ParallelEnv], num_envs: int = 1, **env_kwargs: Any) AsyncPettingZooVecEnv ¶
Returns async-vectorized PettingZoo parallel environments.
- Parameters:
env (pettingzoo.utils.env.ParallelEnv) – PettingZoo parallel environment object
num_envs (int, optional) – Number of vectorized environments, defaults to 1
- agilerl.utils.utils.make_skill_vect_envs(env_name: str, skill: Any, num_envs: int = 1) AsyncVectorEnv ¶
Returns async-vectorized gym environments.
- Parameters:
env_name (str) – Gym environment name
skill (agilerl.wrappers.learning.Skill) – Skill wrapper to apply to environment
num_envs (int, optional) – Number of vectorized environments, defaults to 1
- agilerl.utils.utils.observation_space_channels_to_first(observation_space: Box | Discrete | Dict | Tuple) Box | Discrete | Dict | Tuple ¶
Swaps the channel order of an observation space from [H, W, C] -> [C, H, W].
- Parameters:
observation_space (spaces.Box, spaces.Dict, spaces.Tuple, spaces.Discrete) – Observation space
- Returns:
Observation space with swapped channels
- Return type:
spaces.Box, spaces.Dict, spaces.Tuple, spaces.Discrete
- agilerl.utils.utils.create_population(algo: str, observation_space: Box | Discrete | MultiDiscrete | Dict | Tuple | MultiBinary | List[Box | Discrete | MultiDiscrete | Dict | Tuple | MultiBinary], action_space: Box | Discrete | MultiDiscrete | Dict | Tuple | MultiBinary | List[Box | Discrete | MultiDiscrete | Dict | Tuple | MultiBinary], net_config: Dict[str, Any] | None, INIT_HP: Dict[str, Any], hp_config: HyperparameterConfig | None = None, actor_network: EvolvableModule | None = None, critic_network: EvolvableModule | None = None, agent_wrapper: Callable | None = None, wrapper_kwargs: Dict[str, Any] | None = None, population_size: int = 1, num_envs: int = 1, device: str = 'cpu', accelerator: Any | None = None, torch_compiler: Any | None = None) List[EvolvableAlgorithm] ¶
Returns population of identical agents.
- Parameters:
algo (str) – RL algorithm
observation_space (spaces.Space) – Observation space
action_space (spaces.Space) – Action space
net_config (dict or None) – Network configuration
INIT_HP (dict) – Initial hyperparameters
hp_config (HyperparameterConfig, optional) – Choice of algorithm hyperparameters to mutate during training, defaults to None
actor_network (nn.Module, optional) – Custom actor network, defaults to None
critic_network (nn.Module, optional) – Custom critic network, defaults to None
population_size (int, optional) – Number of agents in population, defaults to 1
num_envs (int, optional) – Number of vectorized environments, defaults to 1
device (str, optional) – Device for accelerated computing, ‘cpu’ or ‘cuda’, defaults to ‘cpu’
accelerator (accelerate.Accelerator(), optional) – Accelerator for distributed computing, defaults to None
torch_compiler (Any, optional) – Torch compiler, defaults to None
- Returns:
Population of agents
- Return type:
- agilerl.utils.utils.save_population_checkpoint(population: List[EvolvableAlgorithm], save_path: str, overwrite_checkpoints: bool, accelerator: Accelerator | None = None) None ¶
Saves checkpoint of population of agents.
- agilerl.utils.utils.tournament_selection_and_mutation(population: List[EvolvableAlgorithm], tournament: TournamentSelection, mutation: Mutations, env_name: str, algo: str | None = None, elite_path: str | None = None, save_elite: bool = False, accelerator: Accelerator | None = None, language_model: bool | None = False) List[EvolvableAlgorithm] ¶
Performs tournament selection and mutation on a population of agents.
- Parameters:
population (list[PopulationType]) – Population of agents
tournament (TournamentSelection) – Tournament selection object
mutation (Mutations) – Mutation object
env_name (str) – Environment name
elite_path (str, optional) – Path to save elite agent, defaults to None
save_elite (bool, optional) – Flag to save elite agent, defaults to False
accelerator (accelerate.Accelerator(), optional) – Accelerator for distributed computing, defaults to None
language_model (bool, optional) – Flag to indicate if the environment is a language model, defaults to False
- Returns:
Population of agents after tournament selection and mutation
- Return type:
list[PopulationType]
- agilerl.utils.utils.init_wandb(algo: str, env_name: str, init_hyperparams: Dict[str, Any] | None = None, mutation_hyperparams: Dict[str, Any] | None = None, wandb_api_key: str | None = None, accelerator: Accelerator | None = None, project: str = 'AgileRL') None ¶
Initializes wandb for logging hyperparameters and run metadata.
- Parameters:
algo (str) – RL algorithm
env_name (str) – Environment name
init_hyperparams (dict, optional) – Initial hyperparameters, defaults to None
mutation_hyperparams (dict, optional) – Mutation hyperparameters, defaults to None
wandb_api_key (str, optional) – Wandb API key, defaults to None
accelerator – Accelerator for distributed computing, defaults to None
- agilerl.utils.utils.calculate_vectorized_scores(rewards: ndarray, terminations: ndarray, include_unterminated: bool = False, only_first_episode: bool = True) List[float] ¶
Calculate the vectorized scores for episodes based on rewards and terminations.
- Parameters:
rewards (np.ndarray) – Array of rewards for each environment.
terminations (np.ndarray) – Array indicating termination points for each environment.
include_unterminated (bool, optional) – Whether to include rewards from unterminated episodes, defaults to False.
only_first_episode (bool, optional) – Whether to consider only the first episode, defaults to True.
- Returns:
List of episode rewards.
- Return type:
- agilerl.utils.utils.print_hyperparams(pop: List[EvolvableAlgorithm]) None ¶
Prints current hyperparameters of agents in a population and their fitnesses.
- Parameters:
pop (list[EvolvableAlgorithm]) – Population of agents
- agilerl.utils.utils.plot_population_score(pop: List[EvolvableAlgorithm]) None ¶
Plots the fitness scores of agents in a population.
- Parameters:
pop (list[EvolvableAlgorithm]) – Population of agents