AgentWrapper

Parameters

class agilerl.wrappers.agent.AgentWrapper(agent: AgentType)

Base class for all agent wrappers. Agent wrappers are used to apply an additional functionality to the get_action() and learn() methods of an EvolvableAlgorithm instance.

Parameters:

agent (AgentType) – Agent to be wrapped

clone(index: int | None = None, wrap: bool = True) SelfAgentWrapper

Clone the wrapper with the underlying agent.

Parameters:
  • index (int | None, optional) – Index of the agent in a population, defaults to None

  • wrap (bool, optional) – If True, wrap the models in the clone with the accelerator, defaults to False

Returns:

Cloned agent wrapper

Return type:

SelfAgentWrapper

property device: str | device

Return the device of the agent.

Returns:

Device of the agent

Return type:

DeviceType

get_action(obs: ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts | dict[str, ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts], *args: Any, **kwargs: Any) Any

Return the action from the agent.

Parameters:
  • obs (ObservationType | MARLObservationType) – Observation from the environment

  • args (Any) – Additional positional arguments

  • kwargs (Any) – Additional keyword arguments

Returns:

Action from the agent

Return type:

Any

learn(experiences: dict[str, ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts] | tuple[ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts, ...], *args: Any, **kwargs: Any) Any

Learns from the experiences.

Parameters:
  • experiences (ExperiencesType) – Experiences from the environment

  • args (Any) – Additional positional arguments

  • kwargs (Any) – Additional keyword arguments

Returns:

Learning information

Return type:

Any

load_checkpoint(path: str) None

Load a checkpoint of agent properties and network weights from path.

Parameters:

path (string) – Location to load checkpoint from

save_checkpoint(path: str) None

Save a checkpoint of agent properties and network weights to path.

Parameters:

path (string) – Location to save checkpoint at

property training: bool

Return the training status of the agent.

Returns:

Training status of the agent

Return type:

bool

RSNorm

Parameters

class agilerl.wrappers.agent.RSNorm(agent: AgentType, epsilon: float = 0.0001, norm_obs_keys: list[str] | None = None)

Wrapper to normalize observations such that each coordinate is centered with unit variance. Handles both single and multi-agent settings, as well as Dict and Tuple observation spaces.

The normalization statistics are only updated when the agent is in training mode. This can be disabled during inference through agent.set_training_mode(False).

Warning

This wrapper is currently only supported for off-policy algorithms since it relies on passed experiences to be formatted as a tuple of PyTorch tensors. Currently AgileRL does not use a Buffer class to store experiences for on-policy algorithms, albeit this will be released in a soon-to-come update!

Parameters:
  • agent (RLAlgorithm, MultiAgentRLAlgorithm) – Agent to be wrapped

  • epsilon (float, optional) – Small value to avoid division by zero, defaults to 1e-4

  • norm_obs_keys (List | None) – List of observation keys to normalize, defaults to None

static build_rms(observation_space: Space, epsilon: float = 0.0001, norm_obs_keys: list[str] | None = None, device: str | device = 'cpu') RunningMeanStd | dict[str, RunningMeanStd] | tuple[RunningMeanStd, ...]

Build the RunningMeanStd object(s) based on the observation space.

Parameters:

observation_space (spaces.Space) – Observation space of the agent

Returns:

RunningMeanStd object(s)

Return type:

RunningMeanStd | dict[str, RunningMeanStd] | tuple[RunningMeanStd, …]

get_action(obs: ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts, *args: Any, **kwargs: Any) Any

Return the action from the agent after normalizing the observation.

Parameters:

obs (ObservationType) – Observation from the environment

Returns:

Action from the agent

Return type:

Any

learn(experiences: dict[str, ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts] | tuple[ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts, ...] | None = None, *args: Any, **kwargs: Any) Any

Learns from the experiences after normalizing the observations.

Parameters:
  • experiences (ExperiencesType) – Experiences from the environment

  • args (Any) – Additional positional arguments

  • kwargs (Any) – Additional keyword arguments

Returns:

Learning information

Return type:

Any

normalize_observation(observation: ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts) ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts

Normalize the observation using the RunningMeanStd object(s).

Parameters:

observation (ObservationType) – Observation from the environment

Returns:

Normalized observation

Return type:

ObservationType

update_statistics(observation: ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts) None

Update the running statistics using the observation.

Parameters:

observation (ObservationType) – Observation from the environment

AsyncAgentsWrapper

Parameters

class agilerl.wrappers.agent.AsyncAgentsWrapper(agent: MultiAgentRLAlgorithm)

Wrapper for multi-agent algorithms that solve environments with asynchronous agents (i.e. environments where agents don’t return observations with the same frequency).

Warning

This currently supports IPPO, MADDPG, and MATD3.

Parameters:

agent (MultiAgentRLAlgorithm) – MultiAgentRLAlgorithm instance to be wrapped.

extract_inactive_agents(obs: dict[str, ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts]) tuple[dict[str, ndarray], dict[str, ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts]]

Extract the inactive agents from an observation. Inspects each key in the observation dictionary and, if all the values are np.nan (as set by AsyncPettingZooVecEnv), the agent is considered inactive and removed from the observation dictionary.

Parameters:

obs (dict[str, ObservationType]) – Observation dictionary

Returns:

Tuple of inactive agents and filtered observations

Return type:

tuple[dict[str, np.ndarray], dict[str, ObservationType]]

get_action(obs: ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts, *args: Any, **kwargs: Any) tuple[int | float | ndarray | Tensor | Any, ...] | int | float | ndarray | Tensor | Any

Return the action from the agent.

Since the environments may not return observations for all agents at the same time, we extract inactive agents from the observation and fill in placeholder values for their actions.

Parameters:

obs (ObservationType) – Observation from the environment

Returns:

Action from the agent

Return type:

Any

learn(experiences: dict[str, ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts] | tuple[ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts, ...], *args: Any, **kwargs: Any) Any

Learns from the collected experiences.

Parameters:
  • experiences (ExperiencesType) – Experiences from the environment

  • args (Any) – Additional positional arguments

  • kwargs (Any) – Additional keyword arguments

Returns:

Learning information

Return type:

Any

stack_experiences(experiences: dict[str, ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts] | tuple[ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts, ...]) dict[str, ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts] | tuple[ndarray | dict[str, ndarray] | tuple[ndarray, ...] | Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | Number | list[ReasoningPrompts] | ReasoningPrompts, ...]

Stacks the experiences.

Parameters:

experiences (ExperiencesType) – Experiences from the environment