Algorithm Utils

Space and Observation Utilities

agilerl.utils.algo_utils.get_input_size_from_space(observation_space: Box | Discrete | MultiDiscrete | Dict | Tuple | MultiBinary | list[Box | Discrete | MultiDiscrete | Dict | Tuple | MultiBinary]) int | dict[str, int] | tuple[int, ...]

Return the dimension of the state space as it pertains to the underlying networks (i.e. the input size of the networks).

Parameters:

observation_space (spaces.Space or list[spaces.Space] or dict[str, spaces.Space].) – The observation space of the environment.

Returns:

The dimension of the state space.

Return type:

int | dict[str, int] | tuple[int, …]

agilerl.utils.algo_utils.get_output_size_from_space(action_space: Box | Discrete | MultiDiscrete | Dict | Tuple | MultiBinary | list[Box | Discrete | MultiDiscrete | Dict | Tuple | MultiBinary]) int | dict[str, int] | tuple[int, ...]

Return the dimension of the action space as it pertains to the underlying networks (i.e. the output size of the networks).

Parameters:

action_space (spaces.Space or list[spaces.Space] or dict[str, spaces.Space].) – The action space of the environment.

Returns:

The dimension of the action space.

Return type:

int | dict[str, int] | tuple[int, …]

agilerl.utils.algo_utils.get_obs_shape(space: Space) tuple[int, ...] | dict[str, tuple[int, ...]]

Return the shape of the observation space.

Parameters:

space (spaces.Space) – Observation space

Returns:

Shape of the observation space

Return type:

tuple[int, …] | dict[str, tuple[int, …]]

agilerl.utils.algo_utils.get_num_actions(space: Space) int

Return the number of actions.

Parameters:

space (spaces.Space) – Action space

Returns:

Number of actions

Return type:

int

agilerl.utils.algo_utils.is_image_space(space: Space) bool

Check if the space is an image space. We ignore dtype and number of channels checks.

Parameters:

space (spaces.Space) – Input space

Returns:

True if the space is an image space, False otherwise

Return type:

bool

agilerl.utils.algo_utils.concatenate_spaces(space_list: list[Box | Discrete | MultiDiscrete | Dict | Tuple | MultiBinary]) Space

Concatenates a list of spaces into a single space. If spaces correspond to images, we check that their shapes are the same and use the first space’s shape as the shape of the concatenated space.

Parameters:

space_list (list[SupportedObsSpaces]) – List of spaces to concatenate

Returns:

Concatenated space

Return type:

spaces.Space

Network and Model Utilities

agilerl.utils.algo_utils.share_encoder_parameters(policy: EvolvableNetworkProtocol, *others: EvolvableNetworkProtocol) None

Shares the encoder parameters between the policy and any number of other networks.

Parameters:
  • policy (EvolvableNetworkProtocol) – The policy network whose encoder parameters will be used.

  • others (EvolvableNetworkProtocol) – The other networks whose encoder parameters will be pinned to the policy.

agilerl.utils.algo_utils.get_hidden_states_shape_from_model(model: Module) dict[str, int]

Loops through all of the modules in the model and checks if they have a hidden_state_architecture attribute. If they do, it adds the items to a dictionary and returns it. This should make it easier to initialize the hidden states of the model.

Parameters:

model (nn.Module) – The model to get the hidden states from.

Returns:

The hidden states shape from the model.

Return type:

dict[str, int]

agilerl.utils.algo_utils.format_shared_critic_encoder(encoder_configs: dict[str, dict[str, Any] | Any]) dict[str, Any]

Format the shared critic (i.e. EvolvableMultiInput) config from the available encoder configs from all of the sub-agents. This dictionary is built when extracting the net config passed by the user in MultiAgentAlgorithm.extract_net_config.

Note

If the user specified multiple different MLP configurations for different sub-agents / groups, the deepest MLP config will be used for the shared critics EvolvableMLP.

Parameters:

encoder_configs (dict[str, Any]) – Network configuration

Returns:

Formatted shared critic encoder config

Return type:

dict[str, Any]

agilerl.utils.algo_utils.get_deepest_head_config(net_config: dict[str, dict[str, Any] | Any], agent_ids: list[str]) dict[str, dict[str, Any] | Any]

Return the deepest head config from the nested net config.

Parameters:
  • net_config (NetConfigType) – Network configuration

  • agent_ids (list[str]) – List of agent IDs

Returns:

Largest head config

agilerl.utils.algo_utils.is_peft_model(model: Module) bool

Check if a model is a PEFT model.

Parameters:

model (nn.Module) – Model to check

Returns:

True if the model is a PEFT model, False otherwise

Return type:

bool

agilerl.utils.algo_utils.clone_llm(original_model: PeftModel | PreTrainedModel | DummyEvolvable, zero_stage: int, state_dict: dict[str, Tensor] | None = None) PeftModel | PreTrainedModel

Clone the actor.

Parameters:
  • original_model (PreTrainedModelType) – Model to clone

  • zero_stage (int, optional) – Zero stage to use, defaults to 0

  • state_dict (dict[str, torch.Tensor] | None, optional) – State dict to load, defaults to None

Returns:

Cloned model

Observation Processing

agilerl.utils.algo_utils.obs_channels_to_first(observation: ndarray | dict[str, ndarray] | tuple[ndarray, ...], expand_dims: bool = False) ndarray | dict[str, ndarray] | tuple[ndarray, ...]

Convert observation space from channels last to channels first format.

Parameters:
  • observation_space (spaces.Box | spaces.Dict) – Observation space

  • expand_dims (bool, optional) – If True, expand the dimensions of the observation, defaults to False

Returns:

Observation space with channels first format

Return type:

spaces.Box | spaces.Dict

agilerl.utils.algo_utils.obs_to_tensor(obs: ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts, device: str | device) Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor]

Move the observation to the given device as a PyTorch tensor.

Parameters:
  • obs (NumpyObsType)

  • device (str | torch.device) – PyTorch device

Returns:

PyTorch tensor of the observation on a desired device.

Return type:

TorchObsType

agilerl.utils.algo_utils.get_vect_dim(observation: ndarray | dict[str, ndarray] | tuple[ndarray, ...], observation_space: Space) int

Return the number of vectorized environments given an observation and its corresponding space.

Parameters:
  • observation (NumpyObsType) – Observation

  • observation_space (spaces.Space) – Observation space

Returns:

Number of vectorized environments

agilerl.utils.algo_utils.add_placeholder_value(obs: Tensor, placeholder_value: float) Tensor

Add placeholder value to observation.

Parameters:
  • obs (torch.Tensor) – Observation

  • placeholder_value (float) – Placeholder value

Returns:

Observation with placeholder value

Return type:

torch.Tensor

agilerl.utils.algo_utils.maybe_add_batch_dim(array_like: ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts, space: Space, actions: bool = False) ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts
agilerl.utils.algo_utils.maybe_add_batch_dim(array_like: ndarray, space: Space, actions: bool = False) ndarray
agilerl.utils.algo_utils.maybe_add_batch_dim(array_like: Tensor, space: Space, actions: bool = False) Tensor

Add batch dimension if necessary.

Parameters:
  • array_like (ObservationType) – Array or tensor

  • space (spaces.Space) – Observation space

  • actions (bool, optional) – Whether the array is an action, defaults to False

Returns:

Observation tensor with batch dimension

Return type:

ObservationType

agilerl.utils.algo_utils.preprocess_observation(observation_space: Space, observation: ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts, device: str | device = 'cpu', normalize_images: bool = True, placeholder_value: Any | None = None) Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor]
agilerl.utils.algo_utils.preprocess_observation(observation_space: Dict, observation: dict[str, ndarray | Tensor], device: str | device = 'cpu', normalize_images: bool = True, placeholder_value: Any | None = None) dict[str, Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor]]
agilerl.utils.algo_utils.preprocess_observation(observation_space: Tuple, observation: tuple[ndarray | Tensor, ...], device: str | device = 'cpu', normalize_images: bool = True, placeholder_value: Any | None = None) tuple[Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor], ...]
agilerl.utils.algo_utils.preprocess_observation(observation_space: Box, observation: ndarray | Tensor, device: str | device = 'cpu', normalize_images: bool = True, placeholder_value: Any | None = None) Tensor
agilerl.utils.algo_utils.preprocess_observation(observation_space: Discrete, observation: ndarray | Tensor, device: str | device = 'cpu', normalize_images: bool = True, placeholder_value: Any | None = None) Tensor
agilerl.utils.algo_utils.preprocess_observation(observation_space: MultiDiscrete, observation: ndarray | Tensor, device: str | device = 'cpu', normalize_images: bool = True, placeholder_value: Any | None = None) Tensor
agilerl.utils.algo_utils.preprocess_observation(observation_space: MultiBinary, observation: ndarray | Tensor, device: str | device = 'cpu', normalize_images: bool = True, placeholder_value: Any | None = None) Tensor

Preprocesses observations for forward pass through neural network.

Parameters:
  • observation_space (spaces.Space) – The observation space of the environment, defaults to the agent’s observation space

  • observation (ObservationType) – Observations of environment

  • device (str | torch.device, optional) – Device for accelerated computing, ‘cpu’ or ‘cuda’, defaults to “cpu”

  • normalize_images (bool, optional) – Normalize images from [0. 255] to [0, 1], defaults to True

  • placeholder_value (Any | None, optional) – The value to use as placeholder for missing observations, defaults to None.

Returns:

Preprocessed observations

Return type:

torch.Tensor[float] or dict[str, torch.Tensor[float]] or tuple[torch.Tensor[float], …]

agilerl.utils.algo_utils.apply_image_normalization(observation: ndarray | Tensor, observation_space: Box) ndarray | Tensor

Normalize images using minmax scaling.

Parameters:
  • observation (ArrayOrTensor) – Observation

  • observation_space (spaces.Box) – Observation space

Returns:

Observation

Return type:

ArrayOrTensor

Experience Handling

agilerl.utils.algo_utils.get_experiences_samples(minibatch_indices: ndarray, *experiences: Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor]) tuple[Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor], ...]

Sample experiences given minibatch indices.

Parameters:
  • minibatch_indices (numpy.ndarray[int]) – Minibatch indices

  • experiences (tuple[torch.Tensor[float], ...]) – Experiences to sample from

Returns:

Sampled experiences

Return type:

tuple[torch.Tensor[float], …]

agilerl.utils.algo_utils.stack_experiences(*experiences: list[ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts] | ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts, to_torch: bool = True) tuple[ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts, ...]

Stacks experiences into a single array or tensor.

Parameters:
  • experiences (list[numpy.ndarray[float]] or list[dict[str, numpy.ndarray[float]]]) – Experiences to stack

  • to_torch (bool, optional) – If True, convert the stacked experiences to a torch tensor, defaults to True

Returns:

Stacked experiences

Return type:

tuple[ArrayOrTensor, …]

agilerl.utils.algo_utils.stack_and_pad_experiences(*experiences: list[ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts] | ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts, padding_values: list[int | float | bool], padding_side: str = 'right', device: str | None = None) tuple[ndarray | Tensor, ...]

Stacks experiences into a single tensor, padding them to the maximum length.

Parameters:
  • experiences (list[numpy.ndarray[float]] or list[dict[str, numpy.ndarray[float]]]) – Experiences to stack

  • to_torch (bool, optional) – If True, convert the stacked experiences to a torch tensor, defaults to True

  • padding_side (str, optional) – Side to pad on, defaults to “right”

Returns:

Stacked experiences

Return type:

tuple[ArrayOrTensor, …]

agilerl.utils.algo_utils.flatten_experiences(*experiences: ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts) tuple[ndarray | Tensor, ...]

Flattens experiences into a single array or tensor.

Parameters:

experiences (tuple[numpy.ndarray[float], ...] or tuple[torch.Tensor[float], ...]) – Experiences to flatten

Returns:

Flattened experiences

Return type:

tuple[numpy.ndarray[float], …] or tuple[torch.Tensor[float], …]

agilerl.utils.algo_utils.is_vectorized_experiences(*experiences: ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts) bool

Check if experiences are vectorised.

Parameters:

experiences (tuple[numpy.ndarray[float], ...] or tuple[torch.Tensor[float], ...]) – Experiences to check

Returns:

True if experiences are vectorised, False otherwise

Return type:

bool

agilerl.utils.algo_utils.vectorize_experiences_by_agent(experiences: dict[str, ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts] | tuple[ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts, ...], dim: int = 1) Tensor | dict[str, Tensor] | tuple[Tensor, ...]

Reorganizes experiences into a tensor, vectorized by time step.

Example input: {‘agent_0’: [[1, 2, 3, 4]], ‘agent_1’: [[5, 6, 7, 8]]} Example output: torch.Tensor([[1, 2, 3, 4, 5, 6, 7, 8]])

Parameters:
  • experiences (ExperiencesType) – Dictionaries containing experiences indexed by agent_id that share a policy agent.

  • dim (int) – New dimension to stack along

Returns:

Tensor, dict of tensors, or tuple of tensors of experiences, stacked along provided dimension

Return type:

torch.Tensor | dict[str, torch.Tensor] | tuple[torch.Tensor, …]

agilerl.utils.algo_utils.experience_to_tensors(experience: dict | tuple | ndarray, space: Space, actions: bool = False) Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor]

Convert experience to numpy array.

Parameters:
  • experience (dict | tuple | np.ndarray) – Experience to convert

  • space (spaces.Space) – Space to convert experience to

  • actions (bool, optional) – Whether the experience is an action, defaults to False

Returns:

Numpy array of experience

Return type:

np.ndarray

agilerl.utils.algo_utils.concatenate_tensors(tensors: list[Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor]]) Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor]

Concatenate tensors along first dimension.

Parameters:

tensors (list[TorchObsType]) – List of tensors to concatenate

Returns:

Concatenated tensor

Return type:

TorchObsType

agilerl.utils.algo_utils.reshape_from_space(tensor: Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor], space: Space) Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor]

Reshape tensor from space.

Parameters:
  • tensor (TorchObsType) – Tensor to reshape

  • space (spaces.Space) – Space to reshape tensor to

Returns:

Reshaped tensor

Return type:

TorchObsType

agilerl.utils.algo_utils.concatenate_experiences_into_batches(experiences: dict[str, ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts] | tuple[ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts, ...], space: Space, actions: bool = False) Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor]

Reorganizes experiences into a batched tensor.

Example input: {‘agent_0’: [[[…1], […2]], [[…5], […6]]],

‘agent_1’: [[[…3], […4]], [[…7], […8]]]}

Example output: torch.Tensor([…1], […2], […3], […4], […5], […6], […7], […8])

Parameters:
  • experiences (ExperiencesType) – Dictionaries containing experiences indexed by agent_id that share a policy agent.

  • space (spaces.Space) – Observation/action/etc space to maintain

  • actions (bool, optional) – Whether the experiences are actions, defaults to False

Returns:

Tensor, dict of tensors, or tuple of tensors of experiences, stacked along first dimension, with shape (num_experiences, *shape)

Return type:

torch.Tensor | dict[str, torch.Tensor] | tuple[torch.Tensor, …]

Checkpoint and Serialization

agilerl.utils.algo_utils.make_safe_deepcopies(*args: EvolvableModuleProtocol | list[EvolvableModuleProtocol]) list[EvolvableModuleProtocol]

Make deep copies of EvolvableModule objects and their attributes.

Parameters:

args (EvolvableModuleProtocol | list[EvolvableModuleProtocol].) – EvolvableModuleProtocol or lists of EvolvableModuleProtocol objects to copy.

Returns:

Deep copies of the EvolvableModule objects and their attributes.

Return type:

list[EvolvableModuleProtocol].

agilerl.utils.algo_utils.isroutine(obj: object) bool

Check if an attribute is a routine, considering also methods wrapped by CudaGraphModule.

Parameters:

attr (str) – The attribute to check.

Returns:

True if the attribute is a routine, False otherwise.

Return type:

bool

agilerl.utils.algo_utils.recursive_check_module_attrs(obj: Any, networks_only: bool = False) bool

Recursively check if the object has any attributes that are EvolvableModuleProtocol objects or Optimizer’s, excluding metaclasses.

Parameters:
  • obj (Any) – The object to check for EvolvableModuleProtocol objects or Optimizer’s.

  • networks_only (bool, optional) – If True, only check for EvolvableModule objects, defaults to False

Returns:

True if the object has any attributes that are EvolvableModuleProtocol objects or Optimizer’s, False otherwise.

Return type:

bool

agilerl.utils.algo_utils.chkpt_attribute_to_device(chkpt_dict: dict[str, Tensor], device: str) dict[str, Any]

Place checkpoint attributes on device. Used when loading saved agents.

Parameters:
  • chkpt_dict (dict) – Checkpoint dictionary

  • device (str) – Device for accelerated computing, ‘cpu’ or ‘cuda’

Returns:

Checkpoint dictionary with attributes on device

Return type:

dict[str, Any]

agilerl.utils.algo_utils.key_in_nested_dict(nested_dict: dict[str, Any], target: str) bool

Determine if key is in nested dictionary.

Parameters:
  • nested_dict (dict[str, dict[str, ...]]) – Nested dictionary

  • target (str) – Target string

Returns:

True if key is in nested dictionary, False otherwise

Return type:

bool

agilerl.utils.algo_utils.remove_compile_prefix(state_dict: dict[str, Any]) dict[str, Any]

Remove _orig_mod prefix on state dict created by torch compile.

Parameters:

state_dict (dict) – model state dict

Returns:

state dict with prefix removed

Return type:

dict[str, Any]

agilerl.utils.algo_utils.module_checkpoint_dict(module: EvolvableModuleProtocol | ModuleDictProtocol | Optimizer | dict[str, Optimizer] | OptimizerWrapperProtocol, name: str) dict[str, Any]

Return a dictionary containing the module’s class, init dict, and state dict.

Parameters:
  • module (EvolvableAttributeType) – The module to checkpoint.

  • name (str) – The name of the attribute to checkpoint.

Returns:

A dictionary containing the module’s class, init dict, and state dict.

Return type:

dict[str, Any]

agilerl.utils.algo_utils.module_checkpoint_single(module: EvolvableModuleProtocol, name: str) dict[str, Any]

Return a dictionary containing the module’s class, init dict, and state dict.

Parameters:
  • module (EvolvableModuleProtocol) – The module to checkpoint.

  • name (str) – The name of the attribute to checkpoint.

Returns:

A dictionary containing the module’s class, init dict, and state dict.

Return type:

dict[str, Any]

agilerl.utils.algo_utils.module_checkpoint_multiagent(module: ModuleDictProtocol[T | EvolvableModuleProtocol | OptimizedModule | EvolvableNetworkProtocol], name: str) dict[str, Any]

Return a dictionary containing the module’s class, init dict, and state dict.

Parameters:
  • module (ModuleDictProtocol) – The module to checkpoint.

  • name (str) – The name of the attribute to checkpoint.

Returns:

A dictionary containing the module’s class, init dict, and state dict.

Return type:

dict[str, Any]

Learning Rate Scheduling

class agilerl.utils.algo_utils.CosineLRScheduleConfig(num_epochs: int, warmup_proportion: float)

Data class to configure a cosine LR scheduler.

agilerl.utils.algo_utils.create_warmup_cosine_scheduler(optimizer: Optimizer, config: CosineLRScheduleConfig, min_lr: float, max_lr: float) SequentialLR

Create cosine annealing lr scheduler with warm-up.

Parameters:
  • optimizer (torch.optim.Optimizer) – Optimizer

  • config (CosineLRScheduleConfig) – LR scheduler config

  • min_lr (float) – Minimum learning rate

  • max_lr (float) – Maximum learning rate

Returns:

Return sequential learning rate scheduler

Return type:

SequentialLR

File and Directory Utilities

agilerl.utils.algo_utils.remove_nested_files(files: list[str]) None

Remove nested files from a list of files.

Parameters:
  • files (list[str]) – List of files to remove nested files from

  • depth (int, optional) – Depth of the nested files, defaults to 0