AgentWrapper¶

Parameters¶

class agilerl.wrappers.agent.AgentWrapper(agent: AgentType)¶

Base class for all agent wrappers. Agent wrappers are used to apply an additional functionality to the get_action() and learn() methods of an EvolvableAlgorithm instance.

Parameters:: agent (AgentType) – Agent to be wrapped

clone(index: int | None = None, wrap: bool = True) → SelfAgentWrapper¶

Clone the wrapper with the underlying agent.

Parameters:

index (int | None, optional) – Index of the agent in a population, defaults to None
wrap (bool, optional) – If True, wrap the models in the clone with the accelerator, defaults to False

Returns:

Cloned agent wrapper

Return type:

SelfAgentWrapper

property device: str | device¶

Return the device of the agent.

Returns:: Device of the agent
Return type:: DeviceType

Return the action from the agent.

Parameters:

obs (ObservationType | MARLObservationType) – Observation from the environment
args (Any) – Additional positional arguments
kwargs (Any) – Additional keyword arguments

Returns:

Action from the agent

Return type:

Any

Learns from the experiences.

Parameters:

experiences (ExperiencesType) – Experiences from the environment
args (Any) – Additional positional arguments
kwargs (Any) – Additional keyword arguments

Returns:

Learning information

Return type:

Any

load_checkpoint(path: str) → None¶

Load a checkpoint of agent properties and network weights from path.

Parameters:: path (string) – Location to load checkpoint from

save_checkpoint(path: str) → None¶

Save a checkpoint of agent properties and network weights to path.

Parameters:: path (string) – Location to save checkpoint at

property training: bool¶

Return the training status of the agent.

Returns:: Training status of the agent
Return type:: bool

RSNorm¶

Parameters¶

class agilerl.wrappers.agent.RSNorm(agent: AgentType, epsilon: float = 0.0001, norm_obs_keys: list[str] | None = None)¶

Wrapper to normalize observations such that each coordinate is centered with unit variance. Handles both single and multi-agent settings, as well as Dict and Tuple observation spaces.

The normalization statistics are only updated when the agent is in training mode. This can be disabled during inference through agent.set_training_mode(False).

Parameters:

agent (RLAlgorithm, MultiAgentRLAlgorithm) – Agent to be wrapped
epsilon (float, optional) – Small value to avoid division by zero, defaults to 1e-4
norm_obs_keys (List | None) – List of observation keys to normalize, defaults to None

static build_rms(observation_space: Space, epsilon: float = 0.0001, norm_obs_keys: list[str] | None = None, device: str | device = 'cpu') → RunningMeanStd | dict[str, RunningMeanStd] | tuple[RunningMeanStd, ...]¶

Build the RunningMeanStd object(s) based on the observation space.

Parameters:: observation_space (spaces.Space) – Observation space of the agent
Returns:: RunningMeanStd object(s)
Return type:: RunningMeanStd | dict[str, RunningMeanStd] | tuple[RunningMeanStd, …]

Return the action from the agent after normalizing the observation.

Parameters:: obs (ObservationType) – Observation from the environment
Returns:: Action from the agent
Return type:: Any

Learns from the experiences after normalizing the observations.

Parameters:

experiences (ExperiencesType) – Experiences from the environment
args (Any) – Additional positional arguments
kwargs (Any) – Additional keyword arguments

Returns:

Learning information

Return type:

Any

Normalize the observation using the RunningMeanStd object(s).

Parameters:: observation (ObservationType) – Observation from the environment
Returns:: Normalized observation
Return type:: ObservationType

Update the running statistics using the observation.

Parameters:: observation (ObservationType) – Observation from the environment

AsyncAgentsWrapper¶

Parameters¶

class agilerl.wrappers.agent.AsyncAgentsWrapper(agent: MultiAgentRLAlgorithm)¶

Wrapper for multi-agent algorithms that solve environments with asynchronous agents (i.e. environments where agents don’t return observations with the same frequency).

Warning

This currently supports IPPO, MADDPG, and MATD3.

Parameters:: agent (MultiAgentRLAlgorithm) – MultiAgentRLAlgorithm instance to be wrapped.

Extract the inactive agents from an observation. Inspects each key in the observation dictionary and, if all the values are np.nan (as set by AsyncPettingZooVecEnv), the agent is considered inactive and removed from the observation dictionary.

Parameters:: obs (dict[str, ObservationType]) – Observation dictionary
Returns:: Tuple of inactive agents and filtered observations
Return type:: tuple[dict[str, np.ndarray], dict[str, ObservationType]]

Return the action from the agent.

Since the environments may not return observations for all agents at the same time, we extract inactive agents from the observation and fill in placeholder values for their actions.

Parameters:: obs (ObservationType) – Observation from the environment
Returns:: Action from the agent
Return type:: Any

Learns from the collected experiences.

Parameters:

experiences (ExperiencesType) – Experiences from the environment
args (Any) – Additional positional arguments
kwargs (Any) – Additional keyword arguments

Returns:

Learning information

Return type:

Any

Stacks the experiences.

Parameters:: experiences (ExperiencesType) – Experiences from the environment