QNetwork¶

Parameters¶

class agilerl.networks.q_networks.QNetwork(*args: Any, **kwargs: Any)¶

Q Networks correspond to state-action value functions in deep reinforcement learning. From any given state, they predict the value of each action that can be taken from that state. By default, we build an encoder that extracts features from an input corresponding to the passed observation space using the AgileRL evolvable modules. The QNetwork then uses an EvolvableMLP as head to predict a value for each possible discrete action for the given state.

Parameters:

observation_space (spaces.Space) – Observation space of the environment.
action_space (DiscreteSpace) – Action space of the environment
encoder_cls (str | type[EvolvableModule] | None) – Encoder class to use for the network. Defaults to None, whereby it is automatically built using an AgileRL module according the observation space.
encoder_config (ConfigType) – Configuration of the encoder network.
head_config (ConfigType | None) – Configuration of the network MLP head.
min_latent_dim (int) – Minimum dimension of the latent space representation. Defaults to 8.
max_latent_dim (int) – Maximum dimension of the latent space representation. Defaults to 128.
latent_dim (int) – Dimension of the latent space representation.
simba (bool) – If True, use a SimBa network for the encoder for vector spaces. Defaults to False.
recurrent – If True, use a recurrent network. Defaults to False. If False and the observation space is a 2D Box space, an EvolvableMLP is used as an encoder whereby observations are flattened.

Otherwise, an EvolvableLSTM is used as an encoder. :type recurrent: bool :param device: Device to use for the network. :type device: str :param random_seed: Random seed to use for the network. Defaults to None. :type random_seed: int | None

build_network_head(net_config: dict[str, Any]) → None¶

Build the head of the network based on the passed configuration.

Parameters:: net_config (dict[str, Any]) – Configuration of the network head.

forward(obs: Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor]) → Tensor¶

Forward pass of the Q network.

Parameters:: obs (TorchObsType) – Input to the network.
Returns:: Output of the network.
Return type:: torch.Tensor

recreate_network() → None¶: Recreates the network.

RainbowQNetwork¶

Parameters¶

class agilerl.networks.q_networks.RainbowQNetwork(*args: Any, **kwargs: Any)¶

RainbowQNetwork is an extension of the QNetwork that incorporates the Rainbow DQN improvements from “Rainbow: Combining Improvements in Deep Reinforcement Learning” (Hessel et al., 2017).

Paper: https://arxiv.org/abs/1710.02298

Parameters:

observation_space (spaces.Space) – Observation space of the environment.
action_space (DiscreteSpace) – Action space of the environment
encoder_cls (str | type[EvolvableModule] | None) – Encoder class to use for the network. Defaults to None, whereby it is automatically built using an AgileRL module according the observation space.
encoder_config (ConfigType) – Configuration of the encoder network.
support (torch.Tensor) – Support for the distributional value function.
num_atoms (int) – Number of atoms in the distributional value function. Defaults to 51.
head_config (ConfigType | None) – Configuration of the network MLP head.
min_latent_dim (int) – Minimum dimension of the latent space representation. Defaults to 8.
max_latent_dim (int) – Maximum dimension of the latent space representation. Defaults to 128.
latent_dim (int) – Dimension of the latent space representation.
device (str) – Device to use for the network.
random_seed (int | None) – Random seed to use for the network. Defaults to None.

build_network_head(net_config: dict[str, Any]) → None¶

Build the value and advantage heads of the network based on the passed configuration.

Parameters:: net_config (dict[str, Any]) – Configuration of the network head.

forward(obs: Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor], q: bool = True, log: bool = False) → Tensor¶

Forward pass of the Rainbow Q network.

Parameters:

obs (torch.Tensor, dict[str, torch.Tensor], or list[torch.Tensor]) – Input to the network.
q (bool) – Whether to return Q values. Defaults to True.
log (bool) – Whether to return log probabilities. Defaults to False.

Returns:

Output of the network.

Return type:

torch.Tensor

recreate_network() → None¶: Recreates the network.

ContinuousQNetwork¶

Parameters¶

class agilerl.networks.q_networks.ContinuousQNetwork(*args: Any, **kwargs: Any)¶

ContinuousQNetwork is an extension of the QNetwork that is used for continuous action spaces. This is used in off-policy algorithms like DDPG and TD3. The network predicts the Q value for a given state-action pair.

Paper: https://arxiv.org/abs/1509.02971

Parameters:

observation_space (spaces.Space) – Observation space of the environment.
action_space (spaces.Box) – Action space of the environment
encoder_cls (str | type[EvolvableModule] | None) – Encoder class to use for the network. Defaults to None, whereby it is automatically built using an AgileRL module according the observation space.
encoder_config (ConfigType) – Configuration of the encoder network.
head_config (ConfigType | None) – Configuration of the network MLP head.
min_latent_dim (int) – Minimum dimension of the latent space representation. Defaults to 8.
max_latent_dim (int) – Maximum dimension of the latent space representation. Defaults to 128.
latent_dim (int) – Dimension of the latent space representation.
simba (bool) – Whether to use SimBA for the network. Defaults to False.
recurrent (bool) – Whether to use a recurrent network. Defaults to False.
device (str) – Device to use for the network.
random_seed (int | None) – Random seed to use for the network. Defaults to None.

build_network_head(net_config: dict[str, dict[str, Any] | Any]) → None¶

Build the head of the network.

Parameters:: head_config (NetConfigType | None) – Configuration of the head.

forward(obs: Tensor | TensorDict | tuple[Tensor, ...] | dict[str, Tensor], actions: ndarray | Tensor) → Tensor¶

Forward pass of the network.

Parameters:

obs (torch.Tensor, dict[str, torch.Tensor], or list[torch.Tensor]) – Input tensor.
actions (torch.Tensor) – Actions tensor.

Returns:

Output tensor.

Return type:

torch.Tensor

recreate_network() → None¶: Recreates the network.