ValueNetwork

Parameters

class agilerl.networks.value_networks.ValueNetwork(*args: Any, **kwargs: Any)

Value functions are used in reinforcement learning to estimate the expected value of a state. For any given observation, we predict a single scalar value that represents the discounted return from that state. Used in e.g. PPO.

Parameters:
  • observation_space (spaces.Space) – Observation space of the environment.

  • encoder_cls (str | type[EvolvableModule] | None) – Encoder class to use for the network. Defaults to None, whereby it is automatically built using an AgileRL module according the observation space.

  • encoder_config (NetConfigType) – Configuration of the encoder.

  • head_config (NetConfigType | None) – Configuration of the head.

  • min_latent_dim (int) – Minimum latent dimension.

  • max_latent_dim (int) – Maximum latent dimension.

  • latent_dim (int) – Latent dimension.

  • simba (bool) – Whether to use the SimBa architecture for training the network.

  • recurrent (bool) – Whether to use a recurrent network.

  • device (str) – Device to run the network on.

  • random_seed (int | None) – Random seed to use for the network. Defaults to None.

build_network_head(net_config: dict[str, dict[str, Any] | Any]) None

Build the head of the network.

Parameters:

net_config (NetConfigType) – Configuration of the head.

forward(x: Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor], hidden_state: Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | None = None) Tensor | tuple[Tensor, Tensor]

Forward pass of the network.

Parameters:

x (torch.Tensor, dict[str, torch.Tensor], or list[torch.Tensor]) – Input tensor.

Returns:

Output tensor.

Return type:

torch.Tensor

get_output_dense() Linear

Return the output dense layer of the network.

Returns:

Output dense layer.

Return type:

torch.nn.Linear

recreate_network() None

Recreates the network.