Algorithm Utils¶

Space and Observation Utilities¶

Return the dimension of the state space as it pertains to the underlying networks (i.e. the input size of the networks).

Parameters:: observation_space (spaces.Space or list[spaces.Space] or dict[str, spaces.Space].) – The observation space of the environment.
Returns:: The dimension of the state space.
Return type:: int | dict[str, int] | tuple[int, …]

Return the dimension of the action space as it pertains to the underlying networks (i.e. the output size of the networks).

Parameters:: action_space (spaces.Space or list[spaces.Space] or dict[str, spaces.Space].) – The action space of the environment.
Returns:: The dimension of the action space.
Return type:: int | dict[str, int] | tuple[int, …]

agilerl.utils.algo_utils.get_obs_shape(space: Space) → tuple[int, ...] | dict[str, tuple[int, ...]]¶

Return the shape of the observation space.

Parameters:: space (spaces.Space) – Observation space
Returns:: Shape of the observation space
Return type:: tuple[int, …] | dict[str, tuple[int, …]]

agilerl.utils.algo_utils.get_num_actions(space: Space) → int¶

Return the number of actions.

Parameters:: space (spaces.Space) – Action space
Returns:: Number of actions
Return type:: int

agilerl.utils.algo_utils.is_image_space(space: Space) → bool¶

Check if the space is an image space. We ignore dtype and number of channels checks.

Parameters:: space (spaces.Space) – Input space
Returns:: True if the space is an image space, False otherwise
Return type:: bool

Concatenates a list of spaces into a single space. If spaces correspond to images, we check that their shapes are the same and use the first space’s shape as the shape of the concatenated space.

Parameters:: space_list (list[SupportedObservationSpace]) – List of spaces to concatenate
Returns:: Concatenated space
Return type:: spaces.Space

Network and Model Utilities¶

agilerl.utils.algo_utils.share_encoder_parameters(policy: EvolvableNetworkProtocol, *others: EvolvableNetworkProtocol) → None¶

Shares the encoder parameters between the policy and any number of other networks.

Parameters:

policy (EvolvableNetworkProtocol) – The policy network whose encoder parameters will be used.
others (EvolvableNetworkProtocol) – The other networks whose encoder parameters will be pinned to the policy.

agilerl.utils.algo_utils.get_hidden_states_shape_from_model(model: Module) → dict[str, int]¶

Loops through all of the modules in the model and checks if they have a hidden_state_architecture attribute. If they do, it adds the items to a dictionary and returns it. This should make it easier to initialize the hidden states of the model.

Parameters:: model (nn.Module) – The model to get the hidden states from.
Returns:: The hidden states shape from the model.
Return type:: dict[str, int]

agilerl.utils.algo_utils.format_shared_critic_encoder(encoder_configs: dict[str, dict[str, Any] | Any]) → dict[str, Any]¶

Format the shared critic (i.e. EvolvableMultiInput) config from the available encoder configs from all of the sub-agents. This dictionary is built when extracting the net config passed by the user in MultiAgentAlgorithm.extract_net_config.

Note

If the user specified multiple different MLP configurations for different sub-agents / groups, the deepest MLP config will be used for the shared critics EvolvableMLP.

Parameters:: encoder_configs (dict[str, Any]) – Network configuration
Returns:: Formatted shared critic encoder config
Return type:: dict[str, Any]

agilerl.utils.algo_utils.get_deepest_head_config(net_config: dict[str, dict[str, Any] | Any], agent_ids: list[str]) → dict[str, dict[str, Any] | Any]¶

Return the deepest head config from the nested net config.

Parameters:

net_config (NetConfigType) – Network configuration
agent_ids (list[str]) – List of agent IDs

Returns:

Largest head config

agilerl.utils.algo_utils.is_peft_model(model: Module) → bool¶

Check if a model is a PEFT model.

Parameters:: model (nn.Module) – Model to check
Returns:: True if the model is a PEFT model, False otherwise
Return type:: bool

agilerl.utils.algo_utils.clone_llm(original_model: PeftModel | PreTrainedModel | DummyEvolvable, zero_stage: int, state_dict: dict[str, Tensor] | None = None) → PeftModel | PreTrainedModel¶

Clone the actor.

Parameters:

original_model (PreTrainedModelType) – Model to clone
zero_stage (int, optional) – Zero stage to use, defaults to 0
state_dict (dict[str, torch.Tensor] | None, optional) – State dict to load, defaults to None

Returns:

Cloned model

Observation Processing¶

Move the observation to the given device as a PyTorch tensor.

Parameters:

obs (NumpyObsType)
device (str | torch.device) – PyTorch device

Returns:

PyTorch tensor of the observation on a desired device.

Return type:

TorchObsType

agilerl.utils.algo_utils.get_vect_dim(observation: ndarray | dict[str, ndarray] | tuple[ndarray, ...], observation_space: Space) → int¶

Return the number of vectorized environments given an observation and its corresponding space.

Parameters:

observation (NumpyObsType) – Observation
observation_space (spaces.Space) – Observation space

Returns:

Number of vectorized environments

agilerl.utils.algo_utils.add_placeholder_value(obs: Tensor, placeholder_value: float) → Tensor¶

Add placeholder value to observation.

Parameters:

obs (torch.Tensor) – Observation
placeholder_value (float) – Placeholder value

Returns:

Observation with placeholder value

Return type:

torch.Tensor

agilerl.utils.algo_utils.maybe_add_batch_dim(array_like: ndarray, space: Space, actions: bool = False) → ndarray

agilerl.utils.algo_utils.maybe_add_batch_dim(array_like: Tensor, space: Space, actions: bool = False) → Tensor

Add batch dimension if necessary.

Parameters:

array_like (ObservationType) – Array or tensor
space (spaces.Space) – Observation space
actions (bool, optional) – Whether the array is an action, defaults to False

Returns:

Observation tensor with batch dimension

Return type:

ObservationType

agilerl.utils.algo_utils.preprocess_observation(observation_space: Dict, observation: dict[str, ndarray | Tensor], device: str | device = 'cpu', normalize_images: bool = True, placeholder_value: Any | None = None, swap_channels: bool = False) → dict[str, Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor]]

agilerl.utils.algo_utils.preprocess_observation(observation_space: Tuple, observation: tuple[ndarray | Tensor, ...], device: str | device = 'cpu', normalize_images: bool = True, placeholder_value: Any | None = None, swap_channels: bool = False) → tuple[Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor], ...]

agilerl.utils.algo_utils.preprocess_observation(observation_space: Box, observation: ndarray | Tensor, device: str | device = 'cpu', normalize_images: bool = True, placeholder_value: Any | None = None, swap_channels: bool = False) → Tensor

agilerl.utils.algo_utils.preprocess_observation(observation_space: Discrete, observation: ndarray | Tensor, device: str | device = 'cpu', normalize_images: bool = True, placeholder_value: Any | None = None, swap_channels: bool = False) → Tensor

agilerl.utils.algo_utils.preprocess_observation(observation_space: MultiDiscrete, observation: ndarray | Tensor, device: str | device = 'cpu', normalize_images: bool = True, placeholder_value: Any | None = None, swap_channels: bool = False) → Tensor

agilerl.utils.algo_utils.preprocess_observation(observation_space: MultiBinary, observation: ndarray | Tensor, device: str | device = 'cpu', normalize_images: bool = True, placeholder_value: Any | None = None, swap_channels: bool = False) → Tensor

Preprocesses observations for forward pass through neural network.

Parameters:

observation_space (spaces.Space) – The observation space of the environment, defaults to the agent’s observation space
observation (ObservationType) – Observations of environment
device (str | torch.device, optional) – Device for accelerated computing, ‘cpu’ or ‘cuda’, defaults to “cpu”
normalize_images (bool, optional) – Normalize images from [0. 255] to [0, 1], defaults to True
placeholder_value (Any | None, optional) – The value to use as placeholder for missing observations, defaults to None.
swap_channels (bool, optional) – Whether to swap channels, defaults to False

Returns:

Preprocessed observations

Return type:

TorchObsType

agilerl.utils.algo_utils.apply_image_normalization(observation: ndarray | Tensor, observation_space: Box) → ndarray | Tensor¶

Normalize images using minmax scaling.

Parameters:

observation (ArrayOrTensor) – Observation
observation_space (spaces.Box) – Observation space

Returns:

Observation

Return type:

ArrayOrTensor

Experience Handling¶

Sample experiences given minibatch indices.

Parameters:

minibatch_indices (numpy.ndarray[int]) – Minibatch indices
experiences (tuple[torch.Tensor[float], ...]) – Experiences to sample from

Returns:

Sampled experiences

Return type:

tuple[torch.Tensor[float], …]

Stacks experiences into a single array or tensor.

Parameters:

experiences (list[numpy.ndarray[float]] or list[dict[str, numpy.ndarray[float]]]) – Experiences to stack
to_torch (bool, optional) – If True, convert the stacked experiences to a torch tensor, defaults to True

Returns:

Stacked experiences

Return type:

tuple[ArrayOrTensor, …]

Stacks experiences into a single tensor, padding them to the maximum length.

Parameters:

experiences (list[numpy.ndarray[float]] or list[dict[str, numpy.ndarray[float]]]) – Experiences to stack
to_torch (bool, optional) – If True, convert the stacked experiences to a torch tensor, defaults to True
padding_side (str, optional) – Side to pad on, defaults to “right”

Returns:

Stacked experiences

Return type:

tuple[ArrayOrTensor, …]

Flattens experiences into a single array or tensor.

Parameters:: experiences (tuple[numpy.ndarray[float], ...] or tuple[torch.Tensor[float], ...]) – Experiences to flatten
Returns:: Flattened experiences
Return type:: tuple[numpy.ndarray[float], …] or tuple[torch.Tensor[float], …]

Check if experiences are vectorised.

Parameters:: experiences (tuple[numpy.ndarray[float], ...] or tuple[torch.Tensor[float], ...]) – Experiences to check
Returns:: True if experiences are vectorised, False otherwise
Return type:: bool

Reorganizes experiences into a tensor, vectorized by time step.

Example input: {‘agent_0’: [[1, 2, 3, 4]], ‘agent_1’: [[5, 6, 7, 8]]} Example output: torch.Tensor([[1, 2, 3, 4, 5, 6, 7, 8]])

Parameters:

experiences (ExperiencesType) – Dictionaries containing experiences indexed by agent_id that share a policy agent.
dim (int) – New dimension to stack along

Returns:

Tensor, dict of tensors, or tuple of tensors of experiences, stacked along provided dimension

Return type:

torch.Tensor | dict[str, torch.Tensor] | tuple[torch.Tensor, …]

Convert experience to numpy array.

Parameters:

experience (dict | tuple | np.ndarray) – Experience to convert
space (spaces.Space) – Space to convert experience to
actions (bool, optional) – Whether the experience is an action, defaults to False

Returns:

Numpy array of experience

Return type:

np.ndarray

Concatenate tensors along first dimension.

Parameters:: tensors (list[TorchObsType]) – List of tensors to concatenate
Returns:: Concatenated tensor
Return type:: TorchObsType

Reshape tensor from space.

Parameters:

tensor (TorchObsType) – Tensor to reshape
space (spaces.Space) – Space to reshape tensor to

Returns:

Reshaped tensor

Return type:

TorchObsType

Reorganizes experiences into a batched tensor.

Example input: {‘agent_0’: [[[…1], […2]], [[…5], […6]]],

‘agent_1’: [[[…3], […4]], [[…7], […8]]]}

Example output: torch.Tensor([…1], […2], […3], […4], […5], […6], […7], […8])

Parameters:

experiences (ExperiencesType) – Dictionaries containing experiences indexed by agent_id that share a policy agent.
space (spaces.Space) – Observation/action/etc space to maintain
actions (bool, optional) – Whether the experiences are actions, defaults to False

Returns:

Tensor, dict of tensors, or tuple of tensors of experiences, stacked along first dimension, with shape (num_experiences, *shape)

Return type:

torch.Tensor | dict[str, torch.Tensor] | tuple[torch.Tensor, …]

Checkpoint and Serialization¶

agilerl.utils.algo_utils.make_safe_deepcopies(*args: EvolvableModuleProtocol | list[EvolvableModuleProtocol]) → list[EvolvableModuleProtocol]¶

Make deep copies of EvolvableModule objects and their attributes.

Parameters:: args (EvolvableModuleProtocol | list[EvolvableModuleProtocol].) – EvolvableModuleProtocol or lists of EvolvableModuleProtocol objects to copy.
Returns:: Deep copies of the EvolvableModule objects and their attributes.
Return type:: list[EvolvableModuleProtocol].

agilerl.utils.algo_utils.isroutine(obj: object) → bool¶

Check if an attribute is a routine, considering also methods wrapped by CudaGraphModule.

Parameters:: attr (str) – The attribute to check.
Returns:: True if the attribute is a routine, False otherwise.
Return type:: bool

agilerl.utils.algo_utils.recursive_check_module_attrs(obj: Any, networks_only: bool = False) → bool¶

Recursively check if the object has any attributes that are EvolvableModuleProtocol objects or Optimizer’s, excluding metaclasses.

Parameters:

obj (Any) – The object to check for EvolvableModuleProtocol objects or Optimizer’s.
networks_only (bool, optional) – If True, only check for EvolvableModule objects, defaults to False

Returns:

True if the object has any attributes that are EvolvableModuleProtocol objects or Optimizer’s, False otherwise.

Return type:

bool

agilerl.utils.algo_utils.chkpt_attribute_to_device(chkpt_dict: dict[str, Tensor], device: str) → dict[str, Any]¶

Place checkpoint attributes on device. Used when loading saved agents.

Parameters:

chkpt_dict (dict) – Checkpoint dictionary
device (str) – Device for accelerated computing, ‘cpu’ or ‘cuda’

Returns:

Checkpoint dictionary with attributes on device

Return type:

dict[str, Any]

agilerl.utils.algo_utils.key_in_nested_dict(nested_dict: dict[str, Any], target: str) → bool¶

Determine if key is in nested dictionary.

Parameters:

nested_dict (dict[str, dict[str, ...]]) – Nested dictionary
target (str) – Target string

Returns:

True if key is in nested dictionary, False otherwise

Return type:

bool

agilerl.utils.algo_utils.remove_compile_prefix(state_dict: dict[str, Any]) → dict[str, Any]¶

Remove _orig_mod prefix on state dict created by torch compile.

Parameters:: state_dict (dict) – model state dict
Returns:: state dict with prefix removed
Return type:: dict[str, Any]

agilerl.utils.algo_utils.module_checkpoint_dict(module: EvolvableModuleProtocol | ModuleDictProtocol | Optimizer | dict[str, Optimizer] | OptimizerWrapperProtocol, name: str) → dict[str, Any]¶

Return a dictionary containing the module’s class, init dict, and state dict.

Parameters:

module (EvolvableAttributeType) – The module to checkpoint.
name (str) – The name of the attribute to checkpoint.

Returns:

A dictionary containing the module’s class, init dict, and state dict.

Return type:

dict[str, Any]

agilerl.utils.algo_utils.module_checkpoint_single(module: EvolvableModuleProtocol, name: str) → dict[str, Any]¶

Return a dictionary containing the module’s class, init dict, and state dict.

Parameters:

module (EvolvableModuleProtocol) – The module to checkpoint.
name (str) – The name of the attribute to checkpoint.

Returns:

A dictionary containing the module’s class, init dict, and state dict.

Return type:

dict[str, Any]

agilerl.utils.algo_utils.module_checkpoint_multiagent(module: ModuleDictProtocol[T | EvolvableModuleProtocol | OptimizedModule | EvolvableNetworkProtocol], name: str) → dict[str, Any]¶

Return a dictionary containing the module’s class, init dict, and state dict.

Parameters:

module (ModuleDictProtocol) – The module to checkpoint.
name (str) – The name of the attribute to checkpoint.

Returns:

A dictionary containing the module’s class, init dict, and state dict.

Return type:

dict[str, Any]

Learning Rate Scheduling¶

class agilerl.utils.algo_utils.CosineLRScheduleConfig(num_epochs: int, warmup_proportion: float)¶: Data class to configure a cosine LR scheduler.

agilerl.utils.algo_utils.create_warmup_cosine_scheduler(optimizer: Optimizer, config: CosineLRScheduleConfig, min_lr: float, max_lr: float) → SequentialLR¶

Create cosine annealing lr scheduler with warm-up.

Parameters:

optimizer (torch.optim.Optimizer) – Optimizer
config (CosineLRScheduleConfig) – LR scheduler config
min_lr (float) – Minimum learning rate
max_lr (float) – Maximum learning rate

Returns:

Return sequential learning rate scheduler

Return type:

SequentialLR

File and Directory Utilities¶

agilerl.utils.algo_utils.remove_nested_files(files: list[str]) → None¶

Remove nested files from a list of files.

Parameters:

files (list[str]) – List of files to remove nested files from
depth (int, optional) – Depth of the nested files, defaults to 0