Contents Menu Expand Light mode Dark mode Auto light/dark, in light mode Auto light/dark, in dark mode Skip to content
AgileRL Documentation
Light Logo Dark Logo
AgileRL Documentation

Introduction

  • Getting Started
    • AgileRL 2.0 Release Notes
  • Releases

Training

  • Evolutionary Hyperparameter Optimization
  • Off-Policy Training
  • On-Policy Training
  • Partially Observable Markov Decision Processes (POMDPs)
  • Offline Training
  • Multi-Agent Training
  • LLM Fine-Tuning
    • Fused linear log-prob optimizations
    • Saving and Loading LLM Checkpoints
  • Contextual Multi-Armed Bandits
  • Distributed Training
  • Evolvable Neural Networks
  • Creating Custom Algorithms
  • Debugging Reinforcement Learning

Tutorials

  • Gymnasium Single-agent Tutorials
    • Acrobot with PPO
    • Lunar Lander with TD3
    • Cartpole with Rainbow DQN
    • Partially Observable Pendulum-v1 with Recurrent PPO
  • PettingZoo Multi-agent Tutorials
    • Self-play Connect4 with DQN + curriculum learning
    • Space Invaders with MADDPG
    • Speaker-Listener with MATD3
  • Hierarchical Skills Tutorial
  • LLM Fine-Tuning Tutorials
    • LLM Reasoning Tutorial
    • LLM Finetuning with HPO
    • Multi-turn finetuning with LLMPPO, LLMREINFORCE, and GRPO
    • LLM Fine-Tuning with SFT and DPO
  • Contextual Multi-arm Bandit Tutorials
    • Iris with NeuralUCB
    • PenDigits with NeuralTS
  • Creating Custom Networks Tutorials
    • Building a Dueling Distributional Q Network
    • Integrating Architecture Mutations Into SimBa

API

  • Algorithms
    • EvolvableAlgorithm Base Class
    • Algorithms Mutations Registry
    • OptimizerWrapper
    • Conservative Q-Learning (CQL)
    • Deep Deterministic Policy Gradient (DDPG)
    • Deep Q-Learning (DQN)
    • Rainbow DQN
    • Implicit Language Q-Learning (ILQL)
    • Proximal Policy Optimization (PPO)
    • Twin Delayed Deep Deterministic Policy Gradient (TD3)
    • Independent Proximal Policy Optimization (IPPO)
    • Multi-Agent Deep Deterministic Policy Gradient (MADDPG)
    • Multi-Agent Twin-Delayed Deep Deterministic Policy Gradient (MATD3)
    • Neural Contextual Bandits with UCB-based Exploration (NeuralUCB)
    • Neural Contextual Bandits with Thompson Sampling (NeuralTS)
    • Group Relative Policy Optimization (GRPO)
    • Clipped Importance Sampling Policy Optimization (CISPO)
    • Grouped Sequence Policy Optimization (GSPO)
    • Direct Preference Optimization (DPO)
    • LLM Proximal Policy Optimization (LLM PPO)
    • LLM REINFORCE
  • Components
    • Experience Replay Buffer
    • Multi-Agent Experience Replay Buffer
    • On-Policy Rollout Buffer
    • Segment Trees
    • Data Structures and Utilities
    • Experience Sampler
  • Hyperparameter Optimization
    • Mutation
    • Tournament Selection
  • Modules
    • EvolvableModule
    • EvolvableWrapper
    • ModuleDict
    • Evolvable Multi-layer Perceptron (MLP)
    • Evolvable Convolutional Neural Network (CNN)
    • Evolvable Long Short-Term Memory (LSTM)
    • Evolvable Multi-Input Neural Network (Dict / Tuple Observations)
    • Evolvable SimBa
    • Evolvable ResNet
    • DummyEvolvable
    • Evolvable GPT
    • Evolvable BERT
    • Custom Activation Functions
  • Networks
    • EvolvableNetwork
    • QNetwork
    • RainbowQNetwork
    • ContinuousQNetwork
    • EvolvableDistribution
    • DeterministicActor
    • StochasticActor
    • ValueNetwork
  • Training
  • Rollouts
    • On-Policy Rollout Functions
  • Utils
    • General Utils
    • Algorithm Utils
    • Cache Utils
    • Evolvable Networks Utils
    • ILQL Utils
    • Log Utils
    • Minari Utils
    • Probe Environments
    • Torch Utils
    • LLM Utils
  • Vector
    • Petting Zoo Async Vector Environment
    • Petting Zoo Vector Base Class
  • Wrappers
    • AgentWrapper
    • RSNorm
    • AsyncAgentsWrapper
    • LLM environments
    • Make Evolvable
    • Skill
    • BanditEnv
    • Petting Zoo Wrapper

Development

  • GitHub
  • Discord
  • Contribute to AgileRL
Back to top
View this page
Edit this page

ComponentsΒΆ

  • Experience Replay Buffer
  • Multi-Agent Experience Replay Buffer
  • On-Policy Rollout Buffer
  • Segment Trees
  • Data Structures and Utilities
  • Experience Sampler
Next
Experience Replay Buffer
Previous
LLM REINFORCE
Copyright © 2023, AgileRL
Made with Sphinx and @pradyunsg's Furo