Using Custom Architecture¶
In all other AgileRL tutorials, the networks used by the agents have been consistently defined with a network configuration
dictionary. Alternatively, it is also possible for a user to define their own custom architecture and add evolvable
functionality through the MakeEvolvable
wrapper.
Creating a Custom Evolvable Network¶
To create a custom evolvable network, firstly you need to define your network class, ensuring correct input and output
dimensions. Below is an example of a simple multi-layer perceptron that can be used by a DQN agent to solve the Lunar
Lander environment. The input size is set as the state dimensions and output size the action dimensions. It’s worth noting
that, during the model definition, it is imperative to employ the torch.nn
module to define all layers instead
of relying on functions from torch.nn.functional
within the forward() method of the network. This is crucial as the
forward hooks implemented will only be able to detect layers derived from nn.Module
.
import torch.nn as nn
import torch
class MLPActor(nn.Module):
def __init__(self, input_size, output_size):
super(MLPActor, self).__init__()
self.linear_layer_1 = nn.Linear(input_size, 64)
self.linear_layer_2 = nn.Linear(64, output_size)
self.relu = nn.ReLU()
def forward(self, x):
x = self.relu(self.linear_layer_1(x))
x = self.linear_layer_2(x)
return x
To make this network evolvable, simply instantiate an MLP Actor object and then pass it, along with an input tensor into
the MakeEvolvable
wrapper.
from agilerl.wrappers.make_evolvable import MakeEvolvable
state_dim = env.single_observation_space.shape
action_dim = env.single_action_space.n
actor = MLPActor(state_dim[0], action_dim)
evolvable_actor = MakeEvolvable(actor,
input_tensor=torch.randn(state_dim[0]),
device=device)
There are two further considerations to make when defining custom architecture. The first is when instantiating the
initialPopulation
object, you need to set net_config
to None
and actor_network
to evolvable_actor
.
pop = initialPopulation(algo="DQN", # Algorithm
state_dim=state_dim, # State dimension
action_dim=action_dim, # Action dimension
one_hot=one_hot, # One-hot encoding
net_config=None, # Network configuration set as None
actor_network=evolvable_actor, # Custom evolvable actor
INIT_HP=INIT_HP, # Initial hyperparameters
population_size=INIT_HP["POPULATION_SIZE"], # Population size
device=device)
If you are using an algorithm that also uses a single critic (PPO, DDPG), define the critic network and pass it into the
initialPopulation
class, again setting net_config
to None
.
pop = initialPopulation(algo="DDPG", # Algorithm
state_dim=state_dim, # State dimension
action_dim=action_dim, # Action dimension
one_hot=one_hot, # One-hot encoding
net_config=None, # Network configuration set as None
actor_network=evolvable_actor, # Custom evolvable actor
critic_network=evolvable_critic, # Custom evolvable critic
INIT_HP=INIT_HP, # Initial hyperparameters
population_size=INIT_HP["POPULATION_SIZE"], # Population size
device=device)
If the single agent algorithm has more than one critic (e.g. TD3), then pass the critic_network
argument a list of two critics.
pop = initialPopulation(algo="TD3", # Algorithm
state_dim=state_dim, # State dimension
action_dim=action_dim, # Action dimension
one_hot=one_hot, # One-hot encoding
net_config=None, # Network configuration set as None
actor_network=evolvable_actor, # Custom evolvable actor
critic_network=[evolvable_critic_1, evolvable_critic_2], # Custom evolvable critic
INIT_HP=INIT_HP, # Initial hyperparameters
population_size=INIT_HP["POPULATION_SIZE"], # Population size
device=device)
If you are using a multi-agent algorithm, define actor_network
and critic_network
as lists containing networks for each agent in the
multi-agent environment. The example below outlines how this would work for a two agent environment.
# For MADDPG
evolvable_actors = [actor_network_1, actor_network_2]
evolvable_critics = [critic_network_1, critic_network_2]
# For MATD3, "critics" will be a list of 2 lists as MATD3 uses one more critic than MADDPG
evolvable_actors = [actor_network_1, actor_network_2]
evolvable_critics = [[critic_1_network_1, critic_1_network_2],
[critic_2_network_1, critic_2_network_2]]
# Instantiate the populations as follows
pop = initialPopulation(algo="MADDPG", # Algorithm
state_dim=state_dim, # State dimensions
action_dim=action_dim, # Action dimensions
one_hot=one_hot, # One-hot encoding
net_config=None, # Network configuration se t as None
actor_network=evolvable_actors, # Custom evolvable actor
critic_network=evolvable_critics, # Custom evolvable critic
INIT_HP=INIT_HP, # Initial hyperparameters
population_size=INIT_HP["POPULATION_SIZE"], # Population size
device=device)
Finally, if you are using a multi-agent algorithm but need to use CNNs to account for RGB image states, there are a few extra considerations
that need to be taken into account when defining your critic network. In MADDPG and MATD3, each agent consists of an actor and critic and each
critic evaluates the states and actions of all agents that act in the multi-agent system. Unlike with non-RGB environments that require MLPs, we cannot
immediately stack the state and action tensors due to differing dimensions, we must first pass the state tensor through the convolutinal layers,
before flattening the output, combining with the actions tensor, and then passing this combined state-action tensor into the fully-connected layer.
This means that when defining the critic, the .forward()
method must account for two input tensors (states and actions). Below are examples of
how to define actor and critic networks for a two agent system with state tensors of shape (4, 210, 160):
from agilerl.networks.custom_activation import GumbelSoftmax
class MultiAgentCNNActor(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv3d(
in_channels=4, out_channels=16, kernel_size=(1, 3, 3), stride=4
)
self.conv2 = nn.Conv3d(
in_channels=16, out_channels=32, kernel_size=(1, 3, 3), stride=2
)
# Define the max-pooling layers
self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
# Define fully connected layers
self.fc1 = nn.Linear(15200, 256)
self.fc2 = nn.Linear(256, 2)
# Define activation function
self.relu = nn.ReLU()
# Define output activation
self.output_activation = GumbelSoftmax()
def forward(self, state_tensor):
# Forward pass through convolutional layers
x = self.relu(self.conv1(state_tensor))
x = self.relu(self.conv2(x))
# Flatten the output for the fully connected layers
x = x.view(x.size(0), -1)
# Forward pass through fully connected layers
x = self.relu(self.fc1(x))
x = self.output_activation(self.fc2(x))
return x
class MultiAgentCNNCritic(nn.Module):
def __init__(self):
super().__init__()
# Define the convolutional layers
self.conv1 = nn.Conv3d(
in_channels=4, out_channels=16, kernel_size=(2, 3, 3), stride=4
)
self.conv2 = nn.Conv3d(
in_channels=16, out_channels=32, kernel_size=(1, 3, 3), stride=2
)
# Define the max-pooling layers
self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
# Define fully connected layers
self.fc1 = nn.Linear(15208, 256)
self.fc2 = nn.Linear(256, 2)
# Define activation function
self.relu = nn.ReLU()
def forward(self, state_tensor, action_tensor):
# Forward pass through convolutional layers
x = self.relu(self.conv1(state_tensor))
x = self.relu(self.conv2(x))
# Flatten the output for the fully connected layers
x = x.view(x.size(0), -1)
x = torch.cat([x, action_tensor], dim=1)
# Forward pass through fully connected layers
x = self.relu(self.fc1(x))
x = self.fc2(x)
return x
To then make these two CNNs evolvable we pass them, along with input tensors into the MakeEvolvable
wrapper.
actor = MultiAgentCNNActor()
evolvable_actor = MakeEvolvable(network=actor,
input_tensor=torch.randn(1, 4, 1, 210, 160), # (B, C_in, D, H, W) D = 1 as actors are decentralised
device=device)
critic = MultiAgentCNNCritic()
evolvable_critic = MakeEvolvable(network=critic,
input_tensor=torch.randn(1, 4, 2, 210, 160), # (B, C_in, D, H, W)),
# D = 2 as critics are centralised and so we evaluate both agents
secondary_input_tensor=torch.randn(1,8), # Assuming 2 agents each with action dimensions of 4
device=device)
The only other consideration that needs to be made is when instantiating the Mutations
class. The arch
argument should be set
as evolvable_actor.arch
for single agent algorithms or evolvable_actors[0].arch
for multi-agent algorithms.
Compatible Architecture¶
At present, MakeEvolvable
is currently compatible with PyTorch multi-layer perceptrons (MLPs) and convolutional neural networks (CNNs). The
network architecture must also be sequential, that is, the output of one layer serves as the input to the next layer. Outlined below is a comprehensive
table of PyTorch layers that are currently supported by this wrapper:
Layer Type |
PyTorch Compatibility |
---|---|
Pooling |
|
Activation |
|
Normalization |
|
Convolutional |
|
Linear |
|
Compatible Algorithms¶
The following table highlights which AgileRL algorithms are currently compatible with custom architecture:
CQL |
DQN |
DDPG |
TD3 |
PPO |
MADDPG |
MATD3 |
ILQL |
Rainbow-DQN |
---|---|---|---|---|---|---|---|---|
✔️ |
✔️ |
✔️ |
✔️ |
✔️ |
✔️ |
✔️ |
❌ |
❌ |