AgileRL 2.0 Release Notes¶
This document outlines the new features and updates in AgileRL 2.0. The main focus of this release is to provide a more flexible framework for creating custom evolvable network architectures and algorithms to make the most out of automatic evolutionary hyperparameter optimization during training. We’ve also done some heavy refactoring to make the codebase more modular and scalable, with the hope that users find it easier to plug-and-play with their arbitrarily complex use-cases.
Features:¶
Support for Dictionary / Tuple Spaces: We have implemented the
EvolvableMultiInput
module, which takes in a (single-level) dictionary or tuple space and assigns anEvolvableCNN
to each underlying image subspace. Observations from vector / discrete spaces are simply concatenated to the image encodings by default, but users can specify if they want these to be processed by anEvolvableMLP
before concatenating.EvolvableModule Class Hierarchy: A wrapper around
nn.Module
that allows us to keep track of the mutation methods in complex networks with nested modules. We use the@mutation
decorator to signal mutation methods and these are registered automatically as such. Such modules should implement arecreate_network()
method that is called automatically after any mutation method is used to modify the network’s architecture.
Note
Users can now pass in non-evolvable architectures to the algorithms too by wrapping their models with DummyEvolvable
.
This is useful when you want to use a pre-trained model or a model whose architecture you don’t want to mutate, while still enabling random weight and RL hyperparameter mutations.
Please refer to Creating Custom Networks for more information.
EvolvableNetwork Class Hierarchy: Towards a more general API for algorithm implementation, where complex observation spaces should be inherently supported, networks inheriting from
EvolvableNetwork
automatically create an appropriate encoder from a given observation space. Custom networks simply have to specify the head to the network that maps the observation encodings to a number of outputs. As part of this update we implement the following common networks used (by default) in the already implemented algorithms.QNetwork
: State-action value function (used in e.g. DQN).RainbowQNetwork
: State-action value function that uses a dueling distributional architecture for the network head (used in Rainbow DQN).ContinuousQNetwork
: State-action value function for continuous action spaces, which takes the actions as input with the observations.ValueNetwork
: Outputs the scalar value of an observation (used in e.g. PPO).DeterministicActor
: Outputs deterministic actions given an action space.StochasticActor
: Outputs an appropriate PyTorch distribution over the given action space.
EvolvableAlgorithm Class Hierarchy: We create a class hierarchy for algorithms with a focus on evolutionary hyperparameter optimization. The EvolvableAlgorithm base class implements common methods across any RL algorithm e.g.
save_checkpoint()
,load()
, but also methods pertaining specifically to mutations e.g.clone()
. Under-the-hood, it initializes aMutationRegistry
that users should use to register “network groups”. The registry also keeps track of the RL hyperparameters users wish to mutate during training and the optimizers. Users wishing to create custom algorithms should now only need to worry about implementingget_action()
,learn()
, and (for now)test()
methods.Generalized Mutations: We have refactored
Mutations
with the above hierarchies in mind to allow for a generalised mutations framework that works for any combination of evolvable networks in an algorithm. Moreover, we now allow users to pass in any configuration of RL hyperparameters they wish to mutate during training directly to an algorithm inheriting fromEvolvableAlgorithm
, rather than handling this inMutations
. For an example of how to do this, please refer to the documentation of any of the algorithms implemented in AgileRL, or our tutorials.
Breaking Changes:¶
We have placed the building blocks of our networks in a dedicated
agilerl.modules
module, which contains the off-the-shelf evolvable modules that can be used to create custom network architectures (e.g.EvolvableMLP
,EvolvableCNN
, andEvolvableMultiInput
), whereas before these were located inagilerl.networks
. In the latter we now keep networks created through theEvolvableNetwork
class hierarchy.Pass in
observation_space
andaction_space
to the algorithms instead ofstate_dim
andaction_dim
. This is to support more complex observation spaces, and allow for a simpler generation of default networks in the algorithms by using theEvolvableNetwork
class hierarchy.Simplified API in the evolvable modules, mutations, and algorithms. Please refer to the documentation for more information.
net_config
argument of algorithms should now be passed in with the arguments of the correspondingEvolvableNetwork
class. For example, inPPO
, thenet_config
argument might include an “encoder_config” key which is different depending on your observation space, and a “head_config” key for the head of the actor (i.e.StochasticActor
) and critic (i.e.ValueNetwork
). All the networks in an algorithm are initialized with the same architecture by default. If users with to use different architectures, these should be passed as arguments directly to the algorithm.
Example Network Configuration¶
net_config = {
# For an image observation space we encode observations using EvolvableCNN
"encoder_config": {
"channel_size": [32],
"kernel_size": [3],
"stride_size": [1],
}
# The head is usually an EvolvableMLP by default
"head_config": {
"hidden_size": [64, 64],
}
}