Algorithm Utils¶
Space and Observation Utilities¶
- agilerl.utils.algo_utils.get_input_size_from_space(observation_space: Box | Discrete | MultiDiscrete | Dict | Tuple | MultiBinary | list[Box | Discrete | MultiDiscrete | Dict | Tuple | MultiBinary]) int | dict[str, int] | tuple[int, ...]¶
Return the dimension of the state space as it pertains to the underlying networks (i.e. the input size of the networks).
- agilerl.utils.algo_utils.get_output_size_from_space(action_space: Box | Discrete | MultiDiscrete | Dict | Tuple | MultiBinary | list[Box | Discrete | MultiDiscrete | Dict | Tuple | MultiBinary]) int | dict[str, int] | tuple[int, ...]¶
Return the dimension of the action space as it pertains to the underlying networks (i.e. the output size of the networks).
- agilerl.utils.algo_utils.get_obs_shape(space: Space) tuple[int, ...] | dict[str, tuple[int, ...]]¶
Return the shape of the observation space.
- agilerl.utils.algo_utils.get_num_actions(space: Space) int¶
Return the number of actions.
- Parameters:
space (spaces.Space) – Action space
- Returns:
Number of actions
- Return type:
- agilerl.utils.algo_utils.is_image_space(space: Space) bool¶
Check if the space is an image space. We ignore dtype and number of channels checks.
- Parameters:
space (spaces.Space) – Input space
- Returns:
True if the space is an image space, False otherwise
- Return type:
- agilerl.utils.algo_utils.concatenate_spaces(space_list: list[Box | Discrete | MultiDiscrete | Dict | Tuple | MultiBinary]) Space¶
Concatenates a list of spaces into a single space. If spaces correspond to images, we check that their shapes are the same and use the first space’s shape as the shape of the concatenated space.
- Parameters:
space_list (list[SupportedObsSpaces]) – List of spaces to concatenate
- Returns:
Concatenated space
- Return type:
spaces.Space
Network and Model Utilities¶
Shares the encoder parameters between the policy and any number of other networks.
- Parameters:
policy (EvolvableNetworkProtocol) – The policy network whose encoder parameters will be used.
others (EvolvableNetworkProtocol) – The other networks whose encoder parameters will be pinned to the policy.
Loops through all of the modules in the model and checks if they have a hidden_state_architecture attribute. If they do, it adds the items to a dictionary and returns it. This should make it easier to initialize the hidden states of the model.
Format the shared critic (i.e. EvolvableMultiInput) config from the available encoder configs from all of the sub-agents. This dictionary is built when extracting the net config passed by the user in MultiAgentAlgorithm.extract_net_config.
Note
If the user specified multiple different MLP configurations for different sub-agents / groups, the deepest MLP config will be used for the shared critics EvolvableMLP.
- agilerl.utils.algo_utils.get_deepest_head_config(net_config: dict[str, dict[str, Any] | Any], agent_ids: list[str]) dict[str, dict[str, Any] | Any]¶
Return the deepest head config from the nested net config.
- agilerl.utils.algo_utils.is_peft_model(model: Module) bool¶
Check if a model is a PEFT model.
- Parameters:
model (nn.Module) – Model to check
- Returns:
True if the model is a PEFT model, False otherwise
- Return type:
Observation Processing¶
- agilerl.utils.algo_utils.obs_channels_to_first(observation: ndarray | dict[str, ndarray] | tuple[ndarray, ...], expand_dims: bool = False) ndarray | dict[str, ndarray] | tuple[ndarray, ...]¶
Convert observation space from channels last to channels first format.
- Parameters:
observation_space (spaces.Box | spaces.Dict) – Observation space
expand_dims (bool, optional) – If True, expand the dimensions of the observation, defaults to False
- Returns:
Observation space with channels first format
- Return type:
spaces.Box | spaces.Dict
- agilerl.utils.algo_utils.obs_to_tensor(obs: ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts, device: str | device) Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor]¶
Move the observation to the given device as a PyTorch tensor.
- Parameters:
obs (NumpyObsType)
device (str | torch.device) – PyTorch device
- Returns:
PyTorch tensor of the observation on a desired device.
- Return type:
TorchObsType
- agilerl.utils.algo_utils.get_vect_dim(observation: ndarray | dict[str, ndarray] | tuple[ndarray, ...], observation_space: Space) int¶
Return the number of vectorized environments given an observation and its corresponding space.
- Parameters:
observation (NumpyObsType) – Observation
observation_space (spaces.Space) – Observation space
- Returns:
Number of vectorized environments
- agilerl.utils.algo_utils.add_placeholder_value(obs: Tensor, placeholder_value: float) Tensor¶
Add placeholder value to observation.
- Parameters:
obs (torch.Tensor) – Observation
placeholder_value (float) – Placeholder value
- Returns:
Observation with placeholder value
- Return type:
torch.Tensor
- agilerl.utils.algo_utils.maybe_add_batch_dim(array_like: ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts, space: Space, actions: bool = False) ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts¶
- agilerl.utils.algo_utils.maybe_add_batch_dim(array_like: ndarray, space: Space, actions: bool = False) ndarray
- agilerl.utils.algo_utils.maybe_add_batch_dim(array_like: Tensor, space: Space, actions: bool = False) Tensor
Add batch dimension if necessary.
- Parameters:
array_like (ObservationType) – Array or tensor
space (spaces.Space) – Observation space
actions (bool, optional) – Whether the array is an action, defaults to False
- Returns:
Observation tensor with batch dimension
- Return type:
ObservationType
- agilerl.utils.algo_utils.preprocess_observation(observation_space: Space, observation: ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts, device: str | device = 'cpu', normalize_images: bool = True, placeholder_value: Any | None = None) Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor]¶
- agilerl.utils.algo_utils.preprocess_observation(observation_space: Dict, observation: dict[str, ndarray | Tensor], device: str | device = 'cpu', normalize_images: bool = True, placeholder_value: Any | None = None) dict[str, Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor]]
- agilerl.utils.algo_utils.preprocess_observation(observation_space: Tuple, observation: tuple[ndarray | Tensor, ...], device: str | device = 'cpu', normalize_images: bool = True, placeholder_value: Any | None = None) tuple[Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor], ...]
- agilerl.utils.algo_utils.preprocess_observation(observation_space: Box, observation: ndarray | Tensor, device: str | device = 'cpu', normalize_images: bool = True, placeholder_value: Any | None = None) Tensor
- agilerl.utils.algo_utils.preprocess_observation(observation_space: Discrete, observation: ndarray | Tensor, device: str | device = 'cpu', normalize_images: bool = True, placeholder_value: Any | None = None) Tensor
- agilerl.utils.algo_utils.preprocess_observation(observation_space: MultiDiscrete, observation: ndarray | Tensor, device: str | device = 'cpu', normalize_images: bool = True, placeholder_value: Any | None = None) Tensor
- agilerl.utils.algo_utils.preprocess_observation(observation_space: MultiBinary, observation: ndarray | Tensor, device: str | device = 'cpu', normalize_images: bool = True, placeholder_value: Any | None = None) Tensor
Preprocesses observations for forward pass through neural network.
- Parameters:
observation_space (spaces.Space) – The observation space of the environment, defaults to the agent’s observation space
observation (ObservationType) – Observations of environment
device (str | torch.device, optional) – Device for accelerated computing, ‘cpu’ or ‘cuda’, defaults to “cpu”
normalize_images (bool, optional) – Normalize images from [0. 255] to [0, 1], defaults to True
placeholder_value (Any | None, optional) – The value to use as placeholder for missing observations, defaults to None.
- Returns:
Preprocessed observations
- Return type:
torch.Tensor[float] or dict[str, torch.Tensor[float]] or tuple[torch.Tensor[float], …]
- agilerl.utils.algo_utils.apply_image_normalization(observation: ndarray | Tensor, observation_space: Box) ndarray | Tensor¶
Normalize images using minmax scaling.
- Parameters:
observation (ArrayOrTensor) – Observation
observation_space (spaces.Box) – Observation space
- Returns:
Observation
- Return type:
ArrayOrTensor
Experience Handling¶
- agilerl.utils.algo_utils.get_experiences_samples(minibatch_indices: ndarray, *experiences: Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor]) tuple[Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor], ...]¶
Sample experiences given minibatch indices.
- agilerl.utils.algo_utils.stack_experiences(*experiences: list[ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts] | ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts, to_torch: bool = True) tuple[ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts, ...]¶
Stacks experiences into a single array or tensor.
- agilerl.utils.algo_utils.stack_and_pad_experiences(*experiences: list[ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts] | ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts, padding_values: list[int | float | bool], padding_side: str = 'right', device: str | None = None) tuple[ndarray | Tensor, ...]¶
Stacks experiences into a single tensor, padding them to the maximum length.
- Parameters:
- Returns:
Stacked experiences
- Return type:
tuple[ArrayOrTensor, …]
- agilerl.utils.algo_utils.flatten_experiences(*experiences: ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts) tuple[ndarray | Tensor, ...]¶
Flattens experiences into a single array or tensor.
- agilerl.utils.algo_utils.is_vectorized_experiences(*experiences: ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts) bool¶
Check if experiences are vectorised.
- agilerl.utils.algo_utils.vectorize_experiences_by_agent(experiences: dict[str, ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts] | tuple[ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts, ...], dim: int = 1) Tensor | dict[str, Tensor] | tuple[Tensor, ...]¶
Reorganizes experiences into a tensor, vectorized by time step.
Example input: {‘agent_0’: [[1, 2, 3, 4]], ‘agent_1’: [[5, 6, 7, 8]]} Example output: torch.Tensor([[1, 2, 3, 4, 5, 6, 7, 8]])
- Parameters:
experiences (ExperiencesType) – Dictionaries containing experiences indexed by agent_id that share a policy agent.
dim (int) – New dimension to stack along
- Returns:
Tensor, dict of tensors, or tuple of tensors of experiences, stacked along provided dimension
- Return type:
torch.Tensor | dict[str, torch.Tensor] | tuple[torch.Tensor, …]
- agilerl.utils.algo_utils.experience_to_tensors(experience: dict | tuple | ndarray, space: Space, actions: bool = False) Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor]¶
Convert experience to numpy array.
- agilerl.utils.algo_utils.concatenate_tensors(tensors: list[Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor]]) Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor]¶
Concatenate tensors along first dimension.
- Parameters:
tensors (list[TorchObsType]) – List of tensors to concatenate
- Returns:
Concatenated tensor
- Return type:
TorchObsType
- agilerl.utils.algo_utils.reshape_from_space(tensor: Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor], space: Space) Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor]¶
Reshape tensor from space.
- Parameters:
tensor (TorchObsType) – Tensor to reshape
space (spaces.Space) – Space to reshape tensor to
- Returns:
Reshaped tensor
- Return type:
TorchObsType
- agilerl.utils.algo_utils.concatenate_experiences_into_batches(experiences: dict[str, ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts] | tuple[ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts, ...], space: Space, actions: bool = False) Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor]¶
Reorganizes experiences into a batched tensor.
Example input: {‘agent_0’: [[[…1], […2]], [[…5], […6]]],
‘agent_1’: [[[…3], […4]], [[…7], […8]]]}
Example output: torch.Tensor([…1], […2], […3], […4], […5], […6], […7], […8])
- Parameters:
experiences (ExperiencesType) – Dictionaries containing experiences indexed by agent_id that share a policy agent.
space (spaces.Space) – Observation/action/etc space to maintain
actions (bool, optional) – Whether the experiences are actions, defaults to False
- Returns:
Tensor, dict of tensors, or tuple of tensors of experiences, stacked along first dimension, with shape (num_experiences, *shape)
- Return type:
torch.Tensor | dict[str, torch.Tensor] | tuple[torch.Tensor, …]
Checkpoint and Serialization¶
- agilerl.utils.algo_utils.make_safe_deepcopies(*args: EvolvableModuleProtocol | list[EvolvableModuleProtocol]) list[EvolvableModuleProtocol]¶
Make deep copies of EvolvableModule objects and their attributes.
- agilerl.utils.algo_utils.isroutine(obj: object) bool¶
Check if an attribute is a routine, considering also methods wrapped by CudaGraphModule.
- agilerl.utils.algo_utils.recursive_check_module_attrs(obj: Any, networks_only: bool = False) bool¶
Recursively check if the object has any attributes that are EvolvableModuleProtocol objects or Optimizer’s, excluding metaclasses.
- Parameters:
obj (Any) – The object to check for EvolvableModuleProtocol objects or Optimizer’s.
networks_only (bool, optional) – If True, only check for EvolvableModule objects, defaults to False
- Returns:
True if the object has any attributes that are EvolvableModuleProtocol objects or Optimizer’s, False otherwise.
- Return type:
- agilerl.utils.algo_utils.chkpt_attribute_to_device(chkpt_dict: dict[str, Tensor], device: str) dict[str, Any]¶
Place checkpoint attributes on device. Used when loading saved agents.
- agilerl.utils.algo_utils.key_in_nested_dict(nested_dict: dict[str, Any], target: str) bool¶
Determine if key is in nested dictionary.
- agilerl.utils.algo_utils.remove_compile_prefix(state_dict: dict[str, Any]) dict[str, Any]¶
Remove _orig_mod prefix on state dict created by torch compile.
- agilerl.utils.algo_utils.module_checkpoint_dict(module: EvolvableModuleProtocol | ModuleDictProtocol | Optimizer | dict[str, Optimizer] | OptimizerWrapperProtocol, name: str) dict[str, Any]¶
Return a dictionary containing the module’s class, init dict, and state dict.
- agilerl.utils.algo_utils.module_checkpoint_single(module: EvolvableModuleProtocol, name: str) dict[str, Any]¶
Return a dictionary containing the module’s class, init dict, and state dict.
Learning Rate Scheduling¶
- class agilerl.utils.algo_utils.CosineLRScheduleConfig(num_epochs: int, warmup_proportion: float)¶
Data class to configure a cosine LR scheduler.
- agilerl.utils.algo_utils.create_warmup_cosine_scheduler(optimizer: Optimizer, config: CosineLRScheduleConfig, min_lr: float, max_lr: float) SequentialLR¶
Create cosine annealing lr scheduler with warm-up.
- Parameters:
optimizer (torch.optim.Optimizer) – Optimizer
config (CosineLRScheduleConfig) – LR scheduler config
min_lr (float) – Minimum learning rate
max_lr (float) – Maximum learning rate
- Returns:
Return sequential learning rate scheduler
- Return type:
SequentialLR