EvolvableDistribution¶
Parameters¶
- class agilerl.networks.actors.EvolvableDistribution(*args, **kwargs)¶
Wrapper to output a distribution over an action space for an evolvable module. It provides methods to sample actions and compute log probabilities, relevant for many policy-gradient algorithms such as PPO, A2C, TRPO.
- Parameters:
action_space (spaces.Space) – Action space of the environment.
network (EvolvableModule) – Network that outputs the logits of the distribution.
- clone() EvolvableDistribution ¶
Clones the distribution.
- Returns:
Cloned distribution.
- Return type:
- forward(latent: Tensor, action_mask: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | Tensor | None = None) Distribution ¶
Forward pass of the network.
- Parameters:
obs (TorchObsType) – Observation input.
action_mask (Optional[ArrayOrTensor]) – Mask to apply to the logits. Defaults to None.
- Returns:
Distribution over the action space.
- Return type:
Distribution
- get_distribution(probs: Tensor, log_std: Tensor | None = None) Distribution ¶
Get the distribution over the action space given an observation.
- Parameters:
probs (torch.Tensor) – Logits output by the network.
log_std (Optional[torch.Tensor]) – Log standard deviation of the action distribution. Defaults to None.
- Returns:
Distribution over the action space.
- Return type:
Distribution
DeterministicActor¶
Parameters¶
- class agilerl.networks.actors.DeterministicActor(*args, **kwargs)¶
Deterministic actor network for policy-gradient algorithms. Given an observation, it outputs the mean of the action distribution. This is useful for e.g. DDPG, SAC, TD3.
- Parameters:
observation_space (spaces.Space) – Observation space of the environment.
action_space (spaces.Space) – Action space of the environment
encoder_cls (Optional[Union[str, Type[EvolvableModule]]]) – Encoder class to use for the network. Defaults to None, whereby it is automatically built using an AgileRL module according the observation space.
encoder_config (ConfigType) – Configuration of the encoder network.
head_config (Optional[ConfigType]) – Configuration of the network MLP head.
min_latent_dim (int) – Minimum dimension of the latent space representation.
max_latent_dim (int) – Maximum dimension of the latent space representation.
n_agents (Optional[int]) – Number of agents in the environment. Defaults to None, which corresponds to single-agent environments.
latent_dim (int) – Dimension of the latent space representation.
device (str) – Device to use for the network.
- build_network_head(net_config: IsDataclass | Dict[str, Any] | None = None) None ¶
Builds the head of the network.
- Parameters:
net_config (Optional[ConfigType]) – Configuration of the head.
StochasticActor¶
Parameters¶
- class agilerl.networks.actors.StochasticActor(*args, **kwargs)¶
Stochastic actor network for policy-gradient algorithms. Given an observation, it outputs a distribution over the action space. This is useful for on-policy policy-gradient algorithms like PPO, A2C, TRPO.
- Parameters:
observation_space (spaces.Space) – Observation space of the environment.
action_space (spaces.Space) – Action space of the environment
encoder_cls (Optional[Union[str, Type[EvolvableModule]]]) – Encoder class to use for the network. Defaults to None, whereby it is automatically built using an AgileRL module according the observation space.
encoder_config (ConfigType) – Configuration of the encoder network.
head_config (Optional[ConfigType]) – Configuration of the network MLP head.
n_agents (Optional[int]) – Number of agents in the environment. Defaults to None, which corresponds to single-agent environments.
latent_dim (int) – Dimension of the latent space representation.
device (str) – Device to use for the network.
- forward(obs: Tensor | Dict[str, Tensor] | Tuple[Tensor, ...], action_mask: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | Tensor | None = None) Distribution ¶
Forward pass of the network.
- Parameters:
obs (TorchObsType) – Observation input.
- Returns:
Distribution over the action space.
- Return type:
Distribution