License Docs status PyPI download total Discord

Streamlining reinforcement learning.

AgileRL is a Deep Reinforcement Learning library focused on improving development by introducing RLOps - MLOps for reinforcement learning.

This library is initially focused on reducing the time taken for training models and hyperparameter optimisation (HPO) by pioneering evolutionary HPO techniques for reinforcement learning. Evolutionary HPO has been shown to drastically reduce overall training times by automatically converging on optimal hyperparameters, without requiring numerous training runs. We are constantly adding more algorithms and features. AgileRL already includes state-of-the-art evolvable on-policy, off-policy, offline and multi-agent reinforcement learning algorithms with distributed training.

Join the AgileRL Discord server to ask questions, get help, and learn more about reinforcement learning.

AgileRL offers 10x faster hyperparameter optimization than SOTA.

Global steps is the sum of every step taken by any agent in the environment, including across an entire population, during the entire hyperparameter optimization process.




Reinforcement learning algorithms and libraries are usually benchmarked once the optimal hyperparameters for training are known, but it often takes hundreds or thousands of experiments to discover these. This is unrealistic and does not reflect the true, total time taken for training. What if we could remove the need to conduct all these prior experiments?

In the charts below, a single AgileRL run, which automatically tunes hyperparameters, is benchmarked against Optuna’s multiple training runs traditionally required for hyperparameter optimization, demonstrating the real time savings possible. Global steps is the sum of every step taken by any agent in the environment, including across an entire population.

AgileRL offers an order of magnitude speed up in hyperparameter optimization vs popular reinforcement learning training frameworks combined with Optuna. Remove the need for multiple training runs and save yourself hours.

AgileRL also supports multi-agent reinforcement learning using the Petting Zoo-style (parallel API). The charts below highlight the performance of our MADDPG and MATD3 algorithms with evolutionary hyper-parameter optimisation (HPO), benchmarked against epymarl’s MADDPG algorithm with grid-search HPO for the simple speaker listener and simple spread environments.


Install as a package with pip:

pip install agilerl

Or install in development mode:

git clone && cd AgileRL
pip install -e .


cd demos

or to demo distributed training:

cd demos
accelerate launch --config_file configs/accelerate/accelerate.yaml demos/


We have created tutorials on how to use AgileRL and train agents on a variety of tasks. Currently, we have tutorials for single-agent Gymnasium tasks, multi-agent PettingZoo environments, learning hierarchical skills and using contextual multi-armed bandits. Our demo files also provide examples on how to train agents using AgileRL, and there are various algorithm-level examples throughout the documentation.