Contents Menu Expand Light mode Dark mode Auto light/dark, in light mode Auto light/dark, in dark mode Skip to content
AgileRL Documentation
Light Logo Dark Logo
AgileRL Documentation

Introduction

  • Getting Started
    • AgileRL 2.0 Release Notes
  • Releases

Training

  • Evolutionary Hyperparameter Optimization
  • Off-Policy Training
  • On-Policy Training
  • Partially Observable Markov Decision Processes (POMDPs)
  • Offline Training
  • Multi-Agent Training
  • LLM Fine-Tuning
    • Fused linear log-prob optimizations
    • Saving and Loading LLM Checkpoints
  • Contextual Multi-Armed Bandits
  • Distributed Training
  • Evolvable Neural Networks
  • Creating Custom Algorithms
  • Debugging Reinforcement Learning

Tutorials

  • Gymnasium Single-agent Tutorials
    • Acrobot with PPO
    • Lunar Lander with TD3
    • Cartpole with Rainbow DQN
    • Partially Observable Pendulum-v1 with Recurrent PPO
  • PettingZoo Multi-agent Tutorials
    • Self-play Connect4 with DQN + curriculum learning
    • Space Invaders with MADDPG
    • Speaker-Listener with MATD3
  • Hierarchical Skills Tutorial
  • LLM Fine-Tuning Tutorials
    • LLM Reasoning Tutorial
    • LLM Finetuning with HPO
    • Multi-turn finetuning with LLMPPO, LLMREINFORCE, and GRPO
    • LLM Fine-Tuning with SFT and DPO
  • Contextual Multi-arm Bandit Tutorials
    • Iris with NeuralUCB
    • PenDigits with NeuralTS
  • Creating Custom Networks Tutorials
    • Building a Dueling Distributional Q Network
    • Integrating Architecture Mutations Into SimBa

API

  • Algorithms
    • EvolvableAlgorithm Base Class
    • Algorithms Mutations Registry
    • OptimizerWrapper
    • Conservative Q-Learning (CQL)
    • Deep Deterministic Policy Gradient (DDPG)
    • Deep Q-Learning (DQN)
    • Rainbow DQN
    • Implicit Language Q-Learning (ILQL)
    • Proximal Policy Optimization (PPO)
    • Twin Delayed Deep Deterministic Policy Gradient (TD3)
    • Independent Proximal Policy Optimization (IPPO)
    • Multi-Agent Deep Deterministic Policy Gradient (MADDPG)
    • Multi-Agent Twin-Delayed Deep Deterministic Policy Gradient (MATD3)
    • Neural Contextual Bandits with UCB-based Exploration (NeuralUCB)
    • Neural Contextual Bandits with Thompson Sampling (NeuralTS)
    • Group Relative Policy Optimization (GRPO)
    • Clipped Importance Sampling Policy Optimization (CISPO)
    • Grouped Sequence Policy Optimization (GSPO)
    • Direct Preference Optimization (DPO)
    • LLM Proximal Policy Optimization (LLM PPO)
    • LLM REINFORCE
  • Components
    • Experience Replay Buffer
    • Multi-Agent Experience Replay Buffer
    • On-Policy Rollout Buffer
    • Segment Trees
    • Data Structures and Utilities
    • Experience Sampler
  • Hyperparameter Optimization
    • Mutation
    • Tournament Selection
  • Modules
    • EvolvableModule
    • EvolvableWrapper
    • ModuleDict
    • Evolvable Multi-layer Perceptron (MLP)
    • Evolvable Convolutional Neural Network (CNN)
    • Evolvable Long Short-Term Memory (LSTM)
    • Evolvable Multi-Input Neural Network (Dict / Tuple Observations)
    • Evolvable SimBa
    • Evolvable ResNet
    • DummyEvolvable
    • Evolvable GPT
    • Evolvable BERT
    • Custom Activation Functions
  • Networks
    • EvolvableNetwork
    • QNetwork
    • RainbowQNetwork
    • ContinuousQNetwork
    • EvolvableDistribution
    • DeterministicActor
    • StochasticActor
    • ValueNetwork
  • Training
  • Rollouts
    • On-Policy Rollout Functions
  • Utils
    • General Utils
    • Algorithm Utils
    • Cache Utils
    • Evolvable Networks Utils
    • ILQL Utils
    • Log Utils
    • Minari Utils
    • Probe Environments
    • Torch Utils
    • LLM Utils
  • Vector
    • Petting Zoo Async Vector Environment
    • Petting Zoo Vector Base Class
  • Wrappers
    • AgentWrapper
    • RSNorm
    • AsyncAgentsWrapper
    • LLM environments
    • Make Evolvable
    • Skill
    • BanditEnv
    • Petting Zoo Wrapper

Development

  • GitHub
  • Discord
  • Contribute to AgileRL
Back to top
View this page
Edit this page

RolloutsΒΆ

Utilities for gathering trajectories of experience from an environment. These helpers can be used directly or passed to agilerl.training.train_on_policy.train_on_policy() to customise how agents interact with an environment.

  • On-Policy Rollout Functions
Next
On-Policy Rollout Functions
Previous
Training
Copyright © 2023, AgileRL
Made with Sphinx and @pradyunsg's Furo