Evolvable Neural Networks in AgileRL¶
Other than the hyperparemeters pertaining to the specific algorithm you’re using to optimize your agent, a large source of variance in the performance of your agent is the choice network architecture. Tuning the architecture of your network is usually a very time-consuming and tedious task, requiring multiple training runs that can take days or even weeks to execute. AgileRL allows you to automatically tune the architecture of your network in a single training run through evolutionary hyperparameter optimization.
Basic Neural Networks¶
In order to mutate the architecture of neural networks seamlessly, we define the EvolvableModule
base class as a building block
for all networks used in AgileRL. This is nothing but a wrapper around Module
that allows us to keep track of the methods that mutate a network
in networks with nested evolvable modules.

Structure of an EvolvableModule
showing the relationship with torch.nn.Module
and mutation capabilities¶
Examples of some very basic modules included in AgileRL are:
EvolvableMLP
: Multi-layer perceptron (MLP) network that maps vector observations to a desired number of outputs, including mutation methods that allow for the random addition or removal of layers and nodes.EvolvableCNN
: Convolutional neural network (CNN) that maps image observations to a desired number of outputs, including mutation methods that allow for the random addition or removal of convolutional layers and neurons, as well as changing the kernel sizes.EvolvableMultiInput
: Network that maps dictionary or tuple observations to a desired number of outputs. This module includes nestedEvolvableModule
’s to process each element of the dictionary or tuple observation separately into a latent space, which are then concatenated and processed by a final dense layer to form a number of outputs. Includes the mutation methods of all nestedEvolvableModule
’s.
Policies, Value Functions, and More Complex Networks¶
In Reinforcement Learning, we often require processing very different types of observations into either actions or values / state-action values.
In order to make the implementation of evolvable policies, value functions, and more complex networks as seamless as possible, we define the EvolvableNetwork
base class which inherits from EvolvableModule
. The diagram below shows the expected structure of a neural network inheriting from this class.

Structure of an EvolvableNetwork
, showing the underlying encoder and head networks which are EvolvableModule
’s themselves.¶
When inheriting from this class, we must pass in the observation space of the environment to the constructor of the class. This allows the network to automatically
build an appropriate encoder from the observation space. Off-the-shelf EvolvableNetwork
’s in AgileRL natively support the following observation spaces:
Box
: Use anEvolvableMLP
,EvolvableCNN
, orEvolvableLSTM
as the encoder, depending on the dimensionality of the observation space.Dict
: Use anEvolvableMultiInput
as the encoder.Tuple
: Use anEvolvableMultiInput
as the encoder.MultiBinary
: Use anEvolvableMLP
as the encoder.MultiDiscrete
: Use anEvolvableMLP
as the encoder.
The encoder processes observations into a latent space, which is then processed by the head network (usually a EvolvableMLP
) to form the final output of the network. The
following networks, common in a variety of reinforcement learning algorithms, are supported out of the box:
QNetwork
: Outputs a state-action value given an observation and action (used in e.g. DQN).RainbowQNetwork
: Uses a distributional dueling architecture to output a distribution of state-action values given an observation and action (used in e.g. Rainbow DQN).ContinuousQNetwork
: Outputs a continuous state-action value given an observation and action (used in e.g. DDPG, TD3).ValueNetwork
: Outputs a single value given an observation (used in e.g. PPO, bandit algorithms).DeterministicActor
: Outputs deterministic actions given an observation (used in e.g. DDPG, TD3).StochasticActor
: Outputs stochastic actions given an observation (used in e.g. PPO).
Note
All EvolvableNetwork
objects expect that the only modules that contribute towards its mutation method are the encoder and head networks. This is
done to ensure that the same mutation can be applied across the different networks optimized in an algorithm during training e.g. actor and critic, since
these usually solve problems that are very similar in nature and thus require similar architectures.
Configuring the Architecture of EvolvableNetwork
’s¶
In order to configure the architecture of EvolvableNetwork
’s, we must pass in separate dictionaries that specify the architecture of the encoder and head networks through
the encoder_config
and head_config
arguments of the constructor of the EvolvableNetwork
class. These dictionaries should include the arguments of the corresponding
EvolvableModule
’s constructor.
If your environment has a 1D Box
observation space, by default the EvolvableNetwork
will use a EvolvableMLP
as the encoder.
from gymnasium.spaces import Box, Discrete
from agilerl.networks.q_networks import QNetwork
encoder_config = {
"hidden_size": [64, 64] # Two layers of 64 nodes each
"min_mlp_nodes": 16 # minimum number of nodes in the MLP when mutating
"max_mlp_nodes": 128 # maximum number of nodes in the MLP when mutating
}
head_config = {
"hidden_size": [64, 64] # Two layers of 64 nodes each
"min_mlp_nodes": 16, # minimum number of nodes in the MLP when mutating
"max_mlp_nodes": 128, # maximum number of nodes in the MLP when mutating
}
observation_space = Box(low=-100, high=100, shape=(10,))
action_space = Discrete(2)
network = QNetwork(
observation_space,
action_space,
encoder_config=encoder_config,
head_config=head_config,
latent_dim=32, # Dimension of the latent space representation
min_latent_dim=8, # Minimum dimension of the latent space representation
max_latent_dim=128, # Maximum dimension of the latent space representation
)
If your environment has a 3D Box
observation space, by default the EvolvableNetwork
will use a EvolvableCNN
as the encoder.
from gymnasium.spaces import Box, Discrete
from agilerl.networks.q_networks import StochasticActor
encoder_config = {
"channel_size": [32, 64, 128], # Three convolutional layers with 32, 64, and 128 channels respectively
"kernel_size": [8, 4, 3], # The kernel sizes of the convolutional layers
"stride_size": [4, 2, 1], # The stride sizes of the convolutional layers
"min_channel_size": 16, # minimum number of channels in the CNN when mutating
"max_channel_size": 256, # maximum number of channels in the CNN when mutating
}
head_config = {
"hidden_size": [64, 64] # Two layers of 64 nodes each
"min_mlp_nodes": 16, # minimum number of nodes in the MLP when mutating
"max_mlp_nodes": 128, # maximum number of nodes in the MLP when mutating
}
observation_space = Box(low=-100, high=100, shape=(10, 10, 10))
action_space = Discrete(2)
network = StochasticActor(
observation_space,
action_space,
encoder_config=encoder_config,
head_config=head_config,
latent_dim=32, # Dimension of the latent space representation
min_latent_dim=8, # Minimum dimension of the latent space representation
max_latent_dim=128, # Maximum dimension of the latent space representation
)
Note
In AgileRL algorithms, pass a single net_config
dictionary that includes the encoder_config
and head_config
dictionaries, as well as any other arguments to
the corresponding network used in the algorithm.
Using Non-Evolvable Networks in an Evolvable Setting¶
It is common that users require using either pre-trained networks or custom architectures that don’t inherit from EvolvableModule
, but still wish
to exploit parameter optimization to automatically tune the RL hyperparameters of an algorithm. In order to do this, users can use DummyEvolvable
to wrap their non-evolvable networks in a manner compatible with our mutations framework - disabling architecture mutations but still allowing for RL hyperparameter and random weight mutations.
Example Usage
import torch
import torch.nn as nn
from sgilerl.algorithms import DQN
from agilerl.modules.dummy import DummyEvolvable
class BasicNetActorDQN(nn.Module):
def __init__(self, input_size, hidden_sizes, output_size):
super().__init__()
layers = []
# Add input layer
layers.append(nn.Linear(input_size, hidden_sizes[0]))
layers.append(nn.ReLU()) # Activation function
# Add hidden layers
for i in range(len(hidden_sizes) - 1):
layers.append(nn.Linear(hidden_sizes[i], hidden_sizes[i + 1]))
layers.append(nn.ReLU()) # Activation function
# Add output layer with a sigmoid activation
layers.append(nn.Linear(hidden_sizes[-1], output_size))
# Combine all layers into a sequential model
self.model = nn.Sequential(*layers)
def forward(self, x):
return self.model(x)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
actor_kwargs = {
"input_size": 4, # Input size
"hidden_sizes": [64, 64], # Hidden layer sizes
"output_size": 2 # Output size
}
actor = DummyEvolvable(BasicNetActor, actor_kwargs, device=device)
# Use the actor in an algorithm
observation_space = ...
action_space = ...
population = DQN.population(
size=4,
observation_space=observation_space,
action_space=action_space
actor_network=actor
)
Integrating Architecture Mutations Into a Custom PyTorch Network¶
Warning
The following section pertains to the MakeEvolvable
wrapper, which will be deprecated in a
future release. We recommend using the EvolvableModule
and EvolvableNetwork
classes to create custom networks, or wrapping your nn.Module
objects with DummyEvolvable
.
For sequential architectures that users have already implemented using PyTorch, it is also possible to add
evolvable functionality through the MakeEvolvable
wrapper. Below is an example of a simple multi-layer
perceptron that can be used by a DQN agent to solve the Lunar Lander environment. The input size is set as
the state dimensions and output size the action dimensions. It’s worth noting that, during the model definition,
it is imperative to employ the torch.nn
module to define all layers instead of relying on functions from
torch.nn.functional
within the forward() method of the network. This is crucial as the forward hooks implemented
will only be able to detect layers derived from nn.Module
.
import torch.nn as nn
import torch
class MLPActor(nn.Module):
def __init__(self, input_size, output_size):
super(MLPActor, self).__init__()
self.linear_layer_1 = nn.Linear(input_size, 64)
self.linear_layer_2 = nn.Linear(64, output_size)
self.relu = nn.ReLU()
def forward(self, x):
x = self.relu(self.linear_layer_1(x))
x = self.linear_layer_2(x)
return x
To make this network evolvable, simply instantiate an MLP Actor object and then pass it, along with an input tensor into
the MakeEvolvable
wrapper.
from agilerl.wrappers.make_evolvable import MakeEvolvable
observation_space = env.single_observation_space
action_space = env.single_action_space
actor = MLPActor(observation_space.shape[0], action_space.n)
evolvable_actor = MakeEvolvable(
actor,
input_tensor=torch.randn(observation_space.shape[0]),
device=device
)
When instantiating using create_population
to generate a population of agents with a custom actor,
you need to set actor_network
to evolvable_actor
.
pop = create_population(
algo="DQN", # Algorithm
observation_space=observation_space, # Observation space
action_space=action_space, # Action space
actor_network=evolvable_actor, # Custom evolvable actor
INIT_HP=INIT_HP, # Initial hyperparameters
population_size=INIT_HP["POPULATION_SIZE"], # Population size
device=device
)
If you are using an algorithm that also uses a single critic (PPO, DDPG), define the critic network and pass it into the
create_population
class.
pop = create_population(
algo="PPO", # Algorithm
observation_space=observation_space, # Observation space
action_space=action_space, # Action space
actor_network=evolvable_actor, # Custom evolvable actor
critic_network=evolvable_critic, # Custom evolvable critic
INIT_HP=INIT_HP, # Initial hyperparameters
population_size=INIT_HP["POPULATION_SIZE"], # Population size
device=device
)
If the single agent algorithm has more than one critic (e.g. TD3), then pass the critic_network
argument a list of two critics.
pop = create_population(
algo="TD3", # Algorithm
observation_space=observation_space, # Observation space
action_space=action_space, # Action space
actor_network=evolvable_actor, # Custom evolvable actor
critic_network=[evolvable_critic_1, evolvable_critic_2], # Custom evolvable critic
INIT_HP=INIT_HP, # Initial hyperparameters
population_size=INIT_HP["POPULATION_SIZE"], # Population size
device=device
)
If you are using a multi-agent algorithm, define actor_network
and critic_network
as lists containing networks for each agent in the
multi-agent environment. The example below outlines how this would work for a two agent environment (asumming you have initialised a multi-agent
environment in the variable env
).
# For MADDPG
evolvable_actors = [actor_network_1, actor_network_2]
evolvable_critics = [critic_network_1, critic_network_2]
# For MATD3, "critics" will be a list of 2 lists as MATD3 uses one more critic than MADDPG
evolvable_actors = [actor_network_1, actor_network_2]
evolvable_critics = [[critic_1_network_1, critic_1_network_2],
[critic_2_network_1, critic_2_network_2]]
# Instantiate the populations as follows
observation_spaces = [env.single_observation_space(agent) for agent in env.agents]
action_spaces = [env.single_action_space(agent) for agent in env.agents]
pop = create_population(
algo="MADDPG", # Algorithm
observation_space=observation_spaces, # Observation space
action_space=action_spaces, # Action space
actor_network=evolvable_actors, # Custom evolvable actor
critic_network=evolvable_critics, # Custom evolvable critic
INIT_HP=INIT_HP, # Initial hyperparameters
population_size=INIT_HP["POPULATION_SIZE"], # Population size
device=device
)
Finally, if you are using a multi-agent algorithm but need to use CNNs to account for RGB image states, there are a few extra considerations
that need to be taken into account when defining your critic network. In MADDPG and MATD3, each agent consists of an actor and critic and each
critic evaluates the states and actions of all agents that act in the multi-agent system. Unlike with non-RGB environments that require MLPs, we cannot
immediately stack the state and action tensors due to differing dimensions, we must first pass the state tensor through the convolutinal layers,
before flattening the output, combining with the actions tensor, and then passing this combined state-action tensor into the fully-connected layer.
This means that when defining the critic, the .forward()
method must account for two input tensors (states and actions). Below are examples of
how to define actor and critic networks for a two agent system with state tensors of shape (4, 210, 160):
from agilerl.networks.custom_activation import GumbelSoftmax
class MultiAgentCNNActor(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv3d(
in_channels=4, out_channels=16, kernel_size=(1, 3, 3), stride=4
)
self.conv2 = nn.Conv3d(
in_channels=16, out_channels=32, kernel_size=(1, 3, 3), stride=2
)
# Define the max-pooling layers
self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
# Define fully connected layers
self.fc1 = nn.Linear(15200, 256)
self.fc2 = nn.Linear(256, 2)
# Define activation function
self.relu = nn.ReLU()
# Define output activation
self.output_activation = GumbelSoftmax()
def forward(self, state_tensor):
# Forward pass through convolutional layers
x = self.relu(self.conv1(state_tensor))
x = self.relu(self.conv2(x))
# Flatten the output for the fully connected layers
x = x.view(x.size(0), -1)
# Forward pass through fully connected layers
x = self.relu(self.fc1(x))
x = self.output_activation(self.fc2(x))
return x
class MultiAgentCNNCritic(nn.Module):
def __init__(self):
super().__init__()
# Define the convolutional layers
self.conv1 = nn.Conv3d(
in_channels=4, out_channels=16, kernel_size=(2, 3, 3), stride=4
)
self.conv2 = nn.Conv3d(
in_channels=16, out_channels=32, kernel_size=(1, 3, 3), stride=2
)
# Define the max-pooling layers
self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
# Define fully connected layers
self.fc1 = nn.Linear(15208, 256)
self.fc2 = nn.Linear(256, 2)
# Define activation function
self.relu = nn.ReLU()
def forward(self, state_tensor, action_tensor):
# Forward pass through convolutional layers
x = self.relu(self.conv1(state_tensor))
x = self.relu(self.conv2(x))
# Flatten the output for the fully connected layers
x = x.view(x.size(0), -1)
x = torch.cat([x, action_tensor], dim=1)
# Forward pass through fully connected layers
x = self.relu(self.fc1(x))
x = self.fc2(x)
return x
To then make these two CNNs evolvable we pass them, along with input tensors into the MakeEvolvable
wrapper.
actor = MultiAgentCNNActor()
evolvable_actor = MakeEvolvable(network=actor,
input_tensor=torch.randn(1, 4, 1, 210, 160), # (B, C_in, D, H, W) D = 1 as actors are decentralised
device=device)
critic = MultiAgentCNNCritic()
evolvable_critic = MakeEvolvable(network=critic,
input_tensor=torch.randn(1, 4, 2, 210, 160), # (B, C_in, D, H, W)),
# D = 2 as critics are centralised and so we evaluate both agents
secondary_input_tensor=torch.randn(1,8), # Assuming 2 agents each with action dimensions of 4
device=device)
Compatible Architecture¶
At present, MakeEvolvable
is currently compatible with PyTorch multi-layer perceptrons (MLPs) and convolutional neural networks (CNNs). The
network architecture must also be sequential, that is, the output of one layer serves as the input to the next layer. Outlined below is a comprehensive
table of PyTorch layers that are currently supported by this wrapper:
Layer Type |
PyTorch Compatibility |
---|---|
Pooling |
|
Activation |
|
Normalization |
|
Convolutional |
|
Linear |
|
Compatible Algorithms¶
The following table highlights which AgileRL algorithms are currently compatible with custom architecture:
CQL |
DQN |
DDPG |
TD3 |
PPO |
MADDPG |
MATD3 |
ILQL |
Rainbow-DQN |
---|---|---|---|---|---|---|---|---|
✔️ |
✔️ |
✔️ |
✔️ |
✔️ |
✔️ |
✔️ |
❌ |
✔️ |