EvolvableDistribution

Parameters

class agilerl.networks.actors.EvolvableDistribution(*args: Any, **kwargs: Any)

Wrapper to output a distribution over an action space for an evolvable module. It provides methods to sample actions and compute log probabilities, relevant for many policy-gradient algorithms such as PPO, A2C, TRPO.

Parameters:
  • action_space (spaces.Space) – Action space of the environment.

  • network (EvolvableModule) – Network that outputs the logits of the distribution.

  • action_std_init (float) – Initial log standard deviation of the action distribution. Defaults to 0.0.

  • squash_output (bool) – Whether to squash the output to the action space.

  • device (DeviceType) – Device to use for the network.

apply_mask(logits: Tensor, mask: ndarray | Tensor) Tensor

Apply a mask to the logits.

Parameters:
  • logits (torch.Tensor) – Logits.

  • mask (ArrayOrTensor) – Mask.

Returns:

Logits with mask applied.

Return type:

torch.Tensor

clone() EvolvableDistribution

Clones the distribution.

Returns:

Cloned distribution.

Return type:

EvolvableDistribution

entropy() Tensor

Get the entropy of the action distribution.

Returns:

Entropy of the action distribution.

Return type:

torch.Tensor

forward(latent: Tensor, action_mask: ndarray | Tensor | None = None, sample: bool = True) tuple[Tensor, Tensor, Tensor] | tuple[None, None, Tensor]

Forward pass of the network.

Parameters:
  • latent (torch.Tensor) – Latent space representation.

  • action_mask (Optional[ArrayOrTensor]) – Mask to apply to the logits. Defaults to None.

  • sample (bool) – Whether to sample an action or return the mode/mean. Defaults to True.

Returns:

Action and log probability of the action.

Return type:

Union[tuple[torch.Tensor, torch.Tensor, torch.Tensor], tuple[None, torch.Tensor, torch.Tensor]]

get_distribution(logits: Tensor) TorchDistribution

Get the distribution over the action space given an observation.

Parameters:

logits (torch.Tensor) – Output of the network, either logits or probabilities.

Returns:

Distribution over the action space.

Return type:

Distribution # This should ideally be TorchDistribution, but keeping for consistency with old file if Distribution was a type alias

log_prob(action: Tensor) Tensor

Get the log probability of the action.

Parameters:

action (torch.Tensor) – Action.

Returns:

Log probability of the action.

Return type:

torch.Tensor

property net_config: dict[str, dict[str, Any] | Any]

Configuration of the network.

Returns:

Configuration of the network.

Return type:

NetConfigType

DeterministicActor

Parameters

class agilerl.networks.actors.DeterministicActor(*args: Any, **kwargs: Any)

Deterministic actor network for policy-gradient algorithms. Given an observation, it outputs the mean of the action distribution. This is useful for e.g. DDPG, SAC, TD3.

Parameters:
  • observation_space (spaces.Space) – Observation space of the environment.

  • action_space (spaces.Box | spaces.Discrete) – Action space of the environment.

  • encoder_cls (str | type[EvolvableModule] | None) – Encoder class to use for the network. Defaults to None, whereby it is automatically built using an AgileRL module according the observation space.

  • encoder_config (NetConfigType) – Configuration of the encoder network.

  • head_config (NetConfigType | None) – Configuration of the network MLP head.

  • min_latent_dim (int) – Minimum dimension of the latent space representation.

  • max_latent_dim (int) – Maximum dimension of the latent space representation.

  • latent_dim (int) – Dimension of the latent space representation.

  • simba (bool) – Whether to use the SimBa architecture for training the network.

  • recurrent (bool) – Whether to use a recurrent network.

  • device (str) – Device to use for the network.

  • random_seed (int | None) – Random seed to use for the network. Defaults to None.

  • encoder_name (str) – Name of the encoder network.

build_network_head(net_config: dict[str, dict[str, Any] | Any] | None = None, **kwargs: Any) None

Build the head of the network.

Parameters:

net_config (NetConfigType | None) – Configuration of the head.

forward(obs: Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor]) Tensor

Forward pass of the network.

Parameters:

obs (TorchObsType) – Observation input.

Returns:

Output of the network.

Return type:

torch.Tensor

recreate_network() None

Recreates the network.

static rescale_action(action: Tensor, low: Tensor, high: Tensor, output_activation: str) Tensor

Rescale an action from the network output bounds to the action space bounds [low, high].

Parameters:
  • action (torch.Tensor) – Action as outputted by the network.

  • low (torch.Tensor) – Minimum action array.

  • high (torch.Tensor) – Maximum action array.

  • output_activation (str) – Output activation function of the network.

Returns:

Action in space bounds [low, high].

Return type:

torch.Tensor

StochasticActor

Parameters

class agilerl.networks.actors.StochasticActor(*args: Any, **kwargs: Any)

Stochastic actor network for policy-gradient algorithms. Given an observation, constructs a distribution over the action space from the logits output by the network. Contains methods to sample actions and compute log probabilities and the entropy of the action distribution, relevant for many policy-gradient algorithms such as PPO, A2C, TRPO.

Parameters:
  • observation_space (spaces.Space) – Observation space of the environment.

  • action_space (spaces.Space) – Action space of the environment

  • encoder_cls (str | type[EvolvableModule] | None) – Encoder class to use for the network. Defaults to None, whereby it is automatically built using an AgileRL module according the observation space.

  • encoder_config (NetConfigType | None) – Configuration of the encoder network.

  • head_config (NetConfigType | None) – Configuration of the network MLP head.

  • action_std_init (float) – Initial log standard deviation of the action distribution. Defaults to 0.0.

  • squash_output (bool) – Whether to squash the output to the action space.

  • min_latent_dim (int) – Minimum dimension of the latent space representation.

  • max_latent_dim (int) – Maximum dimension of the latent space representation.

  • latent_dim (int) – Dimension of the latent space representation.

  • simba (bool) – Whether to use the SimBa architecture for training the network.

  • recurrent (bool) – Whether to use a recurrent network.

  • device (str) – Device to use for the network.

  • random_seed (int | None) – Random seed to use for the network. Defaults to None.

  • encoder_name (str) – Name of the encoder network.

action_entropy() Tensor

Get the entropy of the action distribution.

Returns:

Entropy of the action distribution.

Return type:

torch.Tensor

action_log_prob(action: Tensor) Tensor

Get the log probability of the action.

Parameters:

action (torch.Tensor) – Action.

Returns:

Log probability of the action.

Return type:

torch.Tensor

build_network_head(net_config: dict[str, dict[str, Any] | Any] | None = None) None

Build the head of the network.

Parameters:

net_config (NetConfigType | None) – Configuration of the head.

forward(obs: Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor], action_mask: ndarray | Tensor | None = None) tuple[Tensor, Tensor]

Forward pass of the network.

Parameters:
  • obs (TorchObsType) – Observation input.

  • action_mask (ArrayOrTensor | None) – Action mask.

Returns:

Action and log probability of the action.

Return type:

tuple[torch.Tensor, torch.Tensor]

recreate_network() None

Recreates the network with the same parameters as the current network.

scale_action(action: Tensor) Tensor

Scale the action from [-1, 1] to the action space bounds [low, high].

Parameters:

action (torch.Tensor) – Action.

Returns:

Scaled action.

Return type:

torch.Tensor