EvolvableDistribution

Parameters

class agilerl.networks.actors.EvolvableDistribution(*args, **kwargs)

Wrapper to output a distribution over an action space for an evolvable module. It provides methods to sample actions and compute log probabilities, relevant for many policy-gradient algorithms such as PPO, A2C, TRPO.

Parameters:
  • action_space (spaces.Space) – Action space of the environment.

  • network (EvolvableModule) – Network that outputs the logits of the distribution.

clone() EvolvableDistribution

Clones the distribution.

Returns:

Cloned distribution.

Return type:

EvolvableDistribution

forward(latent: Tensor, action_mask: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | Tensor | None = None) Distribution

Forward pass of the network.

Parameters:
  • obs (TorchObsType) – Observation input.

  • action_mask (Optional[ArrayOrTensor]) – Mask to apply to the logits. Defaults to None.

Returns:

Distribution over the action space.

Return type:

Distribution

get_distribution(probs: Tensor, log_std: Tensor | None = None) Distribution

Get the distribution over the action space given an observation.

Parameters:
  • probs (torch.Tensor) – Logits output by the network.

  • log_std (Optional[torch.Tensor]) – Log standard deviation of the action distribution. Defaults to None.

Returns:

Distribution over the action space.

Return type:

Distribution

property net_config: IsDataclass | Dict[str, Any]

Configuration of the network.

Returns:

Configuration of the network.

Return type:

ConfigType

DeterministicActor

Parameters

class agilerl.networks.actors.DeterministicActor(*args, **kwargs)

Deterministic actor network for policy-gradient algorithms. Given an observation, it outputs the mean of the action distribution. This is useful for e.g. DDPG, SAC, TD3.

Parameters:
  • observation_space (spaces.Space) – Observation space of the environment.

  • action_space (spaces.Space) – Action space of the environment

  • encoder_cls (Optional[Union[str, Type[EvolvableModule]]]) – Encoder class to use for the network. Defaults to None, whereby it is automatically built using an AgileRL module according the observation space.

  • encoder_config (ConfigType) – Configuration of the encoder network.

  • head_config (Optional[ConfigType]) – Configuration of the network MLP head.

  • min_latent_dim (int) – Minimum dimension of the latent space representation.

  • max_latent_dim (int) – Maximum dimension of the latent space representation.

  • n_agents (Optional[int]) – Number of agents in the environment. Defaults to None, which corresponds to single-agent environments.

  • latent_dim (int) – Dimension of the latent space representation.

  • device (str) – Device to use for the network.

build_network_head(net_config: IsDataclass | Dict[str, Any] | None = None) None

Builds the head of the network.

Parameters:

net_config (Optional[ConfigType]) – Configuration of the head.

forward(obs: Tensor | Dict[str, Tensor] | Tuple[Tensor, ...]) Tensor

Forward pass of the network.

Parameters:

obs (TorchObsType) – Observation input.

Returns:

Output of the network.

Return type:

torch.Tensor

recreate_network() None

Recreates the network.

StochasticActor

Parameters

class agilerl.networks.actors.StochasticActor(*args, **kwargs)

Stochastic actor network for policy-gradient algorithms. Given an observation, it outputs a distribution over the action space. This is useful for on-policy policy-gradient algorithms like PPO, A2C, TRPO.

Parameters:
  • observation_space (spaces.Space) – Observation space of the environment.

  • action_space (spaces.Space) – Action space of the environment

  • encoder_cls (Optional[Union[str, Type[EvolvableModule]]]) – Encoder class to use for the network. Defaults to None, whereby it is automatically built using an AgileRL module according the observation space.

  • encoder_config (ConfigType) – Configuration of the encoder network.

  • head_config (Optional[ConfigType]) – Configuration of the network MLP head.

  • n_agents (Optional[int]) – Number of agents in the environment. Defaults to None, which corresponds to single-agent environments.

  • latent_dim (int) – Dimension of the latent space representation.

  • device (str) – Device to use for the network.

forward(obs: Tensor | Dict[str, Tensor] | Tuple[Tensor, ...], action_mask: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | Tensor | None = None) Distribution

Forward pass of the network.

Parameters:

obs (TorchObsType) – Observation input.

Returns:

Distribution over the action space.

Return type:

Distribution

recreate_network() None

Recreates the network with the same parameters as the current network.

Parameters:

shrink_params (bool) – Whether to shrink the parameters of the network. Defaults to False.