ValueNetwork¶
Parameters¶
- class agilerl.networks.value_networks.ValueNetwork(*args: Any, **kwargs: Any)¶
Value functions are used in reinforcement learning to estimate the expected value of a state. For any given observation, we predict a single scalar value that represents the discounted return from that state. Used in e.g. PPO.
- Parameters:
observation_space (spaces.Space) – Observation space of the environment.
encoder_cls (str | type[EvolvableModule] | None) – Encoder class to use for the network. Defaults to None, whereby it is automatically built using an AgileRL module according the observation space.
encoder_config (NetConfigType) – Configuration of the encoder.
head_config (NetConfigType | None) – Configuration of the head.
min_latent_dim (int) – Minimum latent dimension.
max_latent_dim (int) – Maximum latent dimension.
latent_dim (int) – Latent dimension.
simba (bool) – Whether to use the SimBa architecture for training the network.
recurrent (bool) – Whether to use a recurrent network.
device (str) – Device to run the network on.
random_seed (int | None) – Random seed to use for the network. Defaults to None.
- build_network_head(net_config: dict[str, dict[str, Any] | Any]) None¶
Build the head of the network.
- Parameters:
net_config (NetConfigType) – Configuration of the head.
- forward(x: Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor], hidden_state: Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor] | None = None) Tensor | tuple[Tensor, Tensor]¶
Forward pass of the network.
- get_output_dense() Linear¶
Return the output dense layer of the network.
- Returns:
Output dense layer.
- Return type:
torch.nn.Linear