Creating Custom Algorithms¶
To create a custom algorithm, you must inherit from RLAlgorithm
for
single-agent algorithms or MultiAgentRLAlgorithm
for multi-agent
algorithms. For an overview of the class hierarchy and the philosophy behind it please refer to EvolvableAlgorithm Base Class. We have implemented
this hierarchy with the idea of making evolutionary hyperparameter optimization as seamless as possible, and have users focus on their
implementation only. The key components in developing a custom AgileRL algorithm are the following:
Network Groups¶
Users must specify the “network groups” in their algorithm. A network group is a group of networks that work hand in hand with a common objective,
and is registered through a NetworkGroup
object, which contains at least one
evaluation network (i.e. a network that is optimized during training e.g. the Q-network in DQN) and, optionally, “shared” networks that share
parameters with the evaluation network in the group but aren’t optimized during training directly (e.g. the target network in DQN). An RL algorithm
must also contain one NetworkGroup
corresponding to the policy (i.e. the network used to
select actions), signalled by the policy
attribute in the group.
Example¶
In PPO, we would need to define two network groups, since there are two different networks that are optimized during training. The first network group
corresponds to the actor network and the second to the critic network. The actor network is responsible for selecting actions, and should therefore be signalled
as the policy through the policy
argument of NetworkGroup
. In this case, there are no networks that
share parameters with the actor or the critic so we can bypass the shared
argument. We can register these groups as follows through the register_network_group
method of the algorithm:
# Register network groups for mutations
self.register_network_group(
NetworkGroup(
eval=self.actor,
policy=True
)
)
self.register_network_group(
NetworkGroup(
eval=self.critic
)
)
OptimizerWrapper¶
The last thing users should do when creating a custom algorithm is wrap their optimizers in an OptimizerWrapper
,
specifying the networks that the optimizer is responsible for optimizing. Since we are mutating network architectures during training, we need to have knowledge of
this in order to reinitiliaze the optimizers correctly when we do so. In the above example, we have a single optimizer that optimizes the parameters of both the actor and critic networks,
so we can wrap it as follows:
self.optimizer = OptimizerWrapper(
optim.Adam,
networks=[self.actor, self.critic],
lr=self.lr
)
Note
All of the network groups and optimizers of an algorithm should by convention all be defined in the __init__
method of the algorithm.
Finally, users only need to implement the following methods to train agents with the AgileRL framework:
1. learn()
: Responsible for updating the parameters of the networks and the optimizer after collecting
a set of experiences from the environment.
get_action()
: Select action/s from a given observation or batch of observations.test()
: Test the agent in the environment without updating the parameters of the networks.