Creating Custom Algorithms¶
To create a custom algorithm, you must inherit from RLAlgorithm for
single-agent algorithms or MultiAgentRLAlgorithm for multi-agent
algorithms. For an overview of the class hierarchy and the philosophy behind it please refer to EvolvableAlgorithm Base Class. We have implemented
this hierarchy with the idea of making evolutionary hyperparameter optimization as seamless as possible, and have users focus on their
implementation only. The key components in developing a custom AgileRL algorithm are the following:
Network Groups¶
Users must specify the “network groups” in their algorithm. A network group is a group of networks that work hand in hand with a common objective,
and is registered through a NetworkGroup object, which contains at least one
evaluation network (i.e. a network that is optimized during training e.g. the Q-network in DQN) and, optionally, “shared” networks that share
parameters with the evaluation network in the group but aren’t optimized during training directly (e.g. the target network in DQN). An RL algorithm
must also contain one NetworkGroup corresponding to the policy (i.e. the network used to
select actions), signalled by setting policy=True in the NetworkGroup object.
PPO Example¶
In PPO, we would need to define two network groups, since there are two different networks that are optimized during training. The first network group
corresponds to the actor network and the second to the critic network. The actor network is responsible for selecting actions, and is therefore signalled
as the policy. In this case, there are no networks that share parameters with the actor or the critic so we can bypass the shared argument. We can
register these groups as follows through the register_network_group method of the algorithm:
# Register network groups for mutations
self.register_network_group(
NetworkGroup(
eval=self.actor,
policy=True
)
)
self.register_network_group(
NetworkGroup(
eval=self.critic
)
)
OptimizerWrapper¶
The last thing users should do when creating a custom algorithm is wrap their optimizers in an OptimizerWrapper,
specifying the networks that the optimizer is responsible for. Since we are mutating network architectures during training, we need to have knowledge of
this in order to reinitiliaze the optimizers correctly when we do so. In the example below, we have a single optimizer that optimizes the parameters of both the actor and critic networks,
so we can wrap it as follows:
from agilerl.algorithms.core.optimizer_wrapper import OptimizerWrapper
from torch.optim import Adam
# NOTE: We must pass the attributes containing the mutable networks to the OptimizerWrapper
self.optimizer = OptimizerWrapper(
optim.Adam,
networks=[self.actor, self.critic],
lr=self.lr
)
Note
All of the network groups and optimizers of an algorithm should by convention all be defined in the __init__ method of the algorithm. When initializing
an OptimizerWrapper, we require users to pass either a network or a list of networks that are stored as attributes in the algorithm (i.e. preceded by self.).
Finally, users only need to implement the following methods to train agents with the AgileRL framework:
1. learn(): Responsible for updating the parameters of the networks and the optimizer after collecting
a set of experiences from the environment.
get_action(): Select action/s from a given observation or batch of observations.test(): Test the agent in the environment without updating the parameters of the networks.