QNetwork

Parameters

class agilerl.networks.q_networks.QNetwork(*args, **kwargs)

Q Networks correspond to state-action value functions in deep reinforcement learning. From any given state, they predict the value of each action that can be taken from that state. By default, we build an encoder that extracts features from an input corresponding to the passed observation space using the AgileRL evolvable modules. The QNetwork then uses an EvolvableMLP as head to predict a value for each possible discrete action for the given state.

Parameters:
  • observation_space (spaces.Space) – Observation space of the environment.

  • action_space (DiscreteSpace) – Action space of the environment

  • encoder_cls (Optional[Union[str, Type[EvolvableModule]]]) – Encoder class to use for the network. Defaults to None, whereby it is automatically built using an AgileRL module according the observation space.

  • encoder_config (ConfigType) – Configuration of the encoder network.

  • head_config (Optional[ConfigType]) – Configuration of the network MLP head.

  • min_latent_dim (int) – Minimum dimension of the latent space representation. Defaults to 8.

  • max_latent_dim (int) – Maximum dimension of the latent space representation. Defaults to 128.

  • n_agents (Optional[int]) – Number of agents in the environment. Defaults to None, which corresponds to single-agent environments.

  • latent_dim (int) – Dimension of the latent space representation.

  • device (str) – Device to use for the network.

build_network_head(net_config: Dict[str, Any]) None

Builds the head of the network based on the passed configuration.

Parameters:

net_config (Dict[str, Any]) – Configuration of the network head.

forward(obs: Tensor | Dict[str, Tensor] | Tuple[Tensor, ...]) Tensor

Forward pass of the Q network.

Parameters:

obs (TorchObsType) – Input to the network.

Returns:

Output of the network.

Return type:

torch.Tensor

recreate_network() None

Recreates the network

RainbowQNetwork

Parameters

class agilerl.networks.q_networks.RainbowQNetwork(*args, **kwargs)

RainbowQNetwork is an extension of the QNetwork that incorporates the Rainbow DQN improvements from “Rainbow: Combining Improvements in Deep Reinforcement Learning” (Hessel et al., 2017).

Paper: https://arxiv.org/abs/1710.02298

Parameters:
  • observation_space (spaces.Space) – Observation space of the environment.

  • action_space (DiscreteSpace) – Action space of the environment

  • encoder_cls (Optional[Union[str, Type[EvolvableModule]]]) – Encoder class to use for the network. Defaults to None, whereby it is automatically built using an AgileRL module according the observation space.

  • encoder_config (ConfigType) – Configuration of the encoder network.

  • support (torch.Tensor) – Support for the distributional value function.

  • num_atoms (int) – Number of atoms in the distributional value function. Defaults to 51.

  • head_config (Optional[ConfigType]) – Configuration of the network MLP head.

  • min_latent_dim (int) – Minimum dimension of the latent space representation. Defaults to 8.

  • max_latent_dim (int) – Maximum dimension of the latent space representation. Defaults to 128.

  • n_agents (Optional[int]) – Number of agents in the environment. Defaults to None, which corresponds to single-agent environments.

  • latent_dim (int) – Dimension of the latent space representation.

  • device (str) – Device to use for the network.

build_network_head(net_config: Dict[str, Any]) None

Builds the value and advantage heads of the network based on the passed configuration.

Parameters:

net_config (Dict[str, Any]) – Configuration of the network head.

forward(obs: Tensor | Dict[str, Tensor] | Tuple[Tensor, ...], q: bool = True, log: bool = False) Tensor

Forward pass of the Rainbow Q network.

Parameters:
  • obs (torch.Tensor, dict[str, torch.Tensor], or list[torch.Tensor]) – Input to the network.

  • q (bool) – Whether to return Q values. Defaults to True.

  • log (bool) – Whether to return log probabilities. Defaults to False.

Returns:

Output of the network.

Return type:

torch.Tensor

recreate_network() None

Recreates the network.

ContinuousQNetwork

Parameters

class agilerl.networks.q_networks.ContinuousQNetwork(*args, **kwargs)

ContinuousQNetwork is an extension of the QNetwork that is used for continuous action spaces. This is used in off-policy algorithms like DDPG and TD3. The network predicts the Q value for a given state-action pair.

Paper: https://arxiv.org/abs/1509.02971

Parameters:
  • observation_space (spaces.Space) – Observation space of the environment.

  • action_space (spaces.Box) – Action space of the environment

  • encoder_cls (Optional[Union[str, Type[EvolvableModule]]]) – Encoder class to use for the network. Defaults to None, whereby it is automatically built using an AgileRL module according the observation space.

  • encoder_config (ConfigType) – Configuration of the encoder network.

  • head_config (Optional[ConfigType]) – Configuration of the network MLP head.

  • min_latent_dim (int) – Minimum dimension of the latent space representation. Defaults to 8.

  • max_latent_dim (int) – Maximum dimension of the latent space representation. Defaults to 128.

  • n_agents (Optional[int]) – Number of agents in the environment. Defaults to None, which corresponds to single-agent environments.

  • latent_dim (int) – Dimension of the latent space representation.

  • simba (bool) – Whether to use SimBA for the network. Defaults to False.

  • normalize_actions (bool) – Whether to normalize the actions. Defaults to False. This is set to True if the encoder has nn.LayerNorm layers.

  • device (str) – Device to use for the network.

build_network_head(net_config: IsDataclass | Dict[str, Any] | None = None) None

Builds the head of the network.

Parameters:

head_config (Optional[ConfigType]) – Configuration of the head.

forward(obs: Tensor | Dict[str, Tensor] | Tuple[Tensor, ...], actions: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | Tensor) Tensor

Forward pass of the network.

Parameters:
  • obs (torch.Tensor, dict[str, torch.Tensor], or list[torch.Tensor]) – Input tensor.

  • actions (torch.Tensor) – Actions tensor.

Returns:

Output tensor.

Return type:

torch.Tensor

recreate_network() None

Recreates the network.