EvolvableDistribution¶

Parameters¶

class agilerl.networks.actors.EvolvableDistribution(*args: Any, **kwargs: Any)¶

Wrapper to output a distribution over an action space for an evolvable module. It provides methods to sample actions and compute log probabilities, relevant for many policy-gradient algorithms such as PPO, A2C, TRPO.

Parameters:

action_space (spaces.Space) – Action space of the environment.
network (EvolvableModule) – Network that outputs the logits of the distribution.
action_std_init (float) – Initial log standard deviation of the action distribution. Defaults to 0.0.
squash_output (bool) – Whether to squash the output to the action space.
device (DeviceType) – Device to use for the network.

apply_mask(logits: Tensor, mask: ndarray | Tensor) → Tensor¶

Apply a mask to the logits.

Parameters:

logits (torch.Tensor) – Logits.
mask (ArrayOrTensor) – Mask.

Returns:

Logits with mask applied.

Return type:

torch.Tensor

clone() → EvolvableDistribution¶

Clones the distribution.

Returns:: Cloned distribution.
Return type:: EvolvableDistribution

entropy() → Tensor¶

Get the entropy of the action distribution.

Returns:: Entropy of the action distribution.
Return type:: torch.Tensor

forward(latent: Tensor, action_mask: ndarray | Tensor | None = None, sample: bool = True) → tuple[Tensor, Tensor, Tensor] | tuple[None, None, Tensor]¶

Forward pass of the network.

Parameters:

latent (torch.Tensor) – Latent space representation.
action_mask (Optional[ArrayOrTensor]) – Mask to apply to the logits. Defaults to None.
sample (bool) – Whether to sample an action or return the mode/mean. Defaults to True.

Returns:

Action and log probability of the action.

Return type:

Union[tuple[torch.Tensor, torch.Tensor, torch.Tensor], tuple[None, torch.Tensor, torch.Tensor]]

get_distribution(logits: Tensor) → TorchDistribution¶

Get the distribution over the action space given an observation.

Parameters:: logits (torch.Tensor) – Output of the network, either logits or probabilities.
Returns:: Distribution over the action space.
Return type:: Distribution # This should ideally be TorchDistribution, but keeping for consistency with old file if Distribution was a type alias

log_prob(action: Tensor) → Tensor¶

Get the log probability of the action.

Parameters:: action (torch.Tensor) – Action.
Returns:: Log probability of the action.
Return type:: torch.Tensor

property net_config: dict[str, dict[str, Any] | Any]¶

Configuration of the network.

Returns:: Configuration of the network.
Return type:: NetConfigType

DeterministicActor¶

Parameters¶

class agilerl.networks.actors.DeterministicActor(*args: Any, **kwargs: Any)¶

Deterministic actor network for policy-gradient algorithms. Given an observation, it outputs the mean of the action distribution. This is useful for e.g. DDPG, SAC, TD3.

Parameters:

observation_space (spaces.Space) – Observation space of the environment.
action_space (spaces.Box | spaces.Discrete) – Action space of the environment.
encoder_cls (str | type[EvolvableModule] | None) – Encoder class to use for the network. Defaults to None, whereby it is automatically built using an AgileRL module according the observation space.
encoder_config (NetConfigType) – Configuration of the encoder network.
head_config (NetConfigType | None) – Configuration of the network MLP head.
min_latent_dim (int) – Minimum dimension of the latent space representation.
max_latent_dim (int) – Maximum dimension of the latent space representation.
latent_dim (int) – Dimension of the latent space representation.
simba (bool) – Whether to use the SimBa architecture for training the network.
recurrent (bool) – Whether to use a recurrent network.
device (str) – Device to use for the network.
random_seed (int | None) – Random seed to use for the network. Defaults to None.
encoder_name (str) – Name of the encoder network.

build_network_head(net_config: dict[str, dict[str, Any] | Any] | None = None, **kwargs: Any) → None¶

Build the head of the network.

Parameters:: net_config (NetConfigType | None) – Configuration of the head.

forward(obs: Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor]) → Tensor¶

Forward pass of the network.

Parameters:: obs (TorchObsType) – Observation input.
Returns:: Output of the network.
Return type:: torch.Tensor

recreate_network() → None¶: Recreates the network.

static rescale_action(action: Tensor, low: Tensor, high: Tensor, output_activation: str) → Tensor¶

Rescale an action from the network output bounds to the action space bounds [low, high].

Parameters:

action (torch.Tensor) – Action as outputted by the network.
low (torch.Tensor) – Minimum action array.
high (torch.Tensor) – Maximum action array.
output_activation (str) – Output activation function of the network.

Returns:

Action in space bounds [low, high].

Return type:

torch.Tensor

StochasticActor¶

Parameters¶

class agilerl.networks.actors.StochasticActor(*args: Any, **kwargs: Any)¶

Stochastic actor network for policy-gradient algorithms. Given an observation, constructs a distribution over the action space from the logits output by the network. Contains methods to sample actions and compute log probabilities and the entropy of the action distribution, relevant for many policy-gradient algorithms such as PPO, A2C, TRPO.

Parameters:

observation_space (spaces.Space) – Observation space of the environment.
action_space (spaces.Space) – Action space of the environment
encoder_cls (str | type[EvolvableModule] | None) – Encoder class to use for the network. Defaults to None, whereby it is automatically built using an AgileRL module according the observation space.
encoder_config (NetConfigType | None) – Configuration of the encoder network.
head_config (NetConfigType | None) – Configuration of the network MLP head.
action_std_init (float) – Initial log standard deviation of the action distribution. Defaults to 0.0.
squash_output (bool) – Whether to squash the output to the action space.
min_latent_dim (int) – Minimum dimension of the latent space representation.
max_latent_dim (int) – Maximum dimension of the latent space representation.
latent_dim (int) – Dimension of the latent space representation.
simba (bool) – Whether to use the SimBa architecture for training the network.
recurrent (bool) – Whether to use a recurrent network.
device (str) – Device to use for the network.
random_seed (int | None) – Random seed to use for the network. Defaults to None.
encoder_name (str) – Name of the encoder network.

action_entropy() → Tensor¶

Get the entropy of the action distribution.

Returns:: Entropy of the action distribution.
Return type:: torch.Tensor

action_log_prob(action: Tensor) → Tensor¶

Get the log probability of the action.

Parameters:: action (torch.Tensor) – Action.
Returns:: Log probability of the action.
Return type:: torch.Tensor

build_network_head(net_config: dict[str, dict[str, Any] | Any] | None = None) → None¶

Build the head of the network.

Parameters:: net_config (NetConfigType | None) – Configuration of the head.

Forward pass of the network.

Parameters:

obs (TorchObsType) – Observation input.
action_mask (ArrayOrTensor | None) – Action mask.

Returns:

Action and log probability of the action.

Return type:

tuple[torch.Tensor, torch.Tensor]

recreate_network() → None¶: Recreates the network with the same parameters as the current network.

scale_action(action: Tensor) → Tensor¶

Scale the action from [-1, 1] to the action space bounds [low, high].

Parameters:: action (torch.Tensor) – Action.
Returns:: Scaled action.
Return type:: torch.Tensor