Experience Replay Buffer¶
In order to efficiently train a population of RL agents, off-policy algorithms must be used to share memory within populations. This reduces the exploration needed by an individual agent because it allows faster learning from the behaviour of other agents. For example, if you were able to watch a bunch of people attempt to solve a maze, you could learn from their mistakes and successes without necessarily having to explore the entire maze yourself.
The object used to store experiences collected by agents in the environment is called the Experience Replay Buffer, and is defined by the class ReplayBuffer().
During training we use the ReplayBuffer.add() function to add experiences to the buffer as TensorDict objects. Specifically, we wrap transitions through the
Transition tensorclass that wraps the obs, action, reward, next_obs, and done fields as torch.Tensor objects. To sample from the replay
buffer, call ReplayBuffer.sample().
from agilerl.components.replay_buffer import ReplayBuffer
memory = ReplayBuffer(
max_size=10000, # Max replay buffer size
device=device,
)
Parameters¶
- class agilerl.components.replay_buffer.ReplayBuffer(max_size: int, device: str | device = 'cpu', dtype: dtype = torch.float32)¶
A circular replay buffer for off-policy learning using a TensorDict as storage.
- Parameters:
- property storage: TensorDict¶
Storage of the buffer.
- class agilerl.components.replay_buffer.MultiStepReplayBuffer(max_size: int, n_step: int = 3, gamma: float = 0.99, device: str | device = 'cpu', dtype: dtype = torch.float32)¶
A circular replay buffer for n-step returns in off-policy learning.
- Parameters:
- add(data: TensorDict) TensorDict | None¶
Add a transition to the n-step buffer and potentially to the replay buffer.
- Parameters:
data (TensorDict) – Transition to add to the buffer
- Returns:
First transition in the n-step buffer
- Return type:
TensorDict | None
- sample_from_indices(idxs: Tensor) TensorDict¶
Sample a batch of transitions from the buffer using the provided indices.
- Parameters:
idxs (torch.Tensor) – Indices of the transitions to sample
- Returns:
TensorDict containing sampled experiences
- Return type:
TensorDict
- property storage: TensorDict¶
Storage of the buffer.
- class agilerl.components.replay_buffer.PrioritizedReplayBuffer(max_size: int, alpha: float = 0.6, device: str | device = 'cpu', dtype: dtype = torch.float32)¶
A prioritized replay buffer for off-policy learning as introduced in the paper ‘Prioritized Experience Replay’ (Schaul et al., 2015).
- Parameters:
- add(data: TensorDict) None¶
Add a transition to the buffer.
- Parameters:
data (TensorDict) – Transition to add to the buffer
- sample(batch_size: int, beta: float = 0.4) TensorDict¶
Sample a batch of transitions based on priorities.
- property storage: TensorDict¶
Storage of the buffer.