Experience Replay Buffer¶
In order to efficiently train a population of RL agents, off-policy algorithms must be used to share memory within populations. This reduces the exploration needed by an individual agent because it allows faster learning from the behaviour of other agents. For example, if you were able to watch a bunch of people attempt to solve a maze, you could learn from their mistakes and successes without necessarily having to explore the entire maze yourself.
The object used to store experiences collected by agents in the environment is called the Experience Replay Buffer, and is defined by the class ReplayBuffer()
.
During training it can be added to using the ReplayBuffer.save2memory()
function, or ReplayBuffer.save2memoryVectEnvs()
for vectorized environments (recommended).
To sample from the replay buffer, call ReplayBuffer.sample()
.
from agilerl.components.replay_buffer import ReplayBuffer
import torch
field_names = ["state", "action", "reward", "next_state", "done"]
memory = ReplayBuffer(action_dim=action_dim, # Number of agent actions
memory_size=10000, # Max replay buffer size
field_names=field_names, # Field names to store in memory
device=torch.device("cuda"))
Parameters¶
- class agilerl.components.replay_buffer.ReplayBuffer(action_dim, memory_size, field_names, device=None)¶
The Experience Replay Buffer class. Used to store experiences and allow off-policy learning.
- Parameters:
- sample(batch_size, return_idx=False)¶
Returns sample of experiences from memory.
- save2memory(*args, is_vectorised=False)¶
Applies appropriate save2memory function depending on whether the environment is vectorised or not.
- Parameters:
*args –
Variable length argument list. Contains batched or unbatched transition elements in consistent order, e.g. states, actions, rewards, next_states, dones
is_vectorised (bool) – Boolean flag indicating if the environment has been vectorised
- save2memorySingleEnv(*args)¶
Saves experience to memory.
- Parameters:
*args –
Variable length argument list. Contains transition elements in consistent order, e.g. state, action, reward, next_state, done
- save2memoryVectEnvs(*args)¶
Saves multiple experiences to memory.
- Parameters:
*args –
Variable length argument list. Contains batched transition elements in consistent order, e.g. states, actions, rewards, next_states, dones
- class agilerl.components.replay_buffer.MultiStepReplayBuffer(action_dim, memory_size, field_names, num_envs, n_step=3, gamma=0.99, device=None)¶
The Multi-step Experience Replay Buffer class. Used to store experiences and allow off-policy learning.
- Parameters:
action_dim (int) – Action dimension
memory_size (int) – Maximum length of replay buffer
field_names (list[str]) – Field names for experience named tuple, e.g. [‘state’, ‘action’, ‘reward’]
num_envs (int) – Number of parallel environments for training
n_step (int, optional) – Step number to calculate n-step td error, defaults to 3
gamma (float, optional) – Discount factor, defaults to 0.99
device (str, optional) – Device for accelerated computing, ‘cpu’ or ‘cuda’, defaults to None
- sample(batch_size, return_idx=False)¶
Returns sample of experiences from memory.
- sample_from_indices(idxs)¶
Returns sample of experiences from memory using provided indices.
- save2memory(*args, is_vectorised=False)¶
Applies appropriate save2memory function depending on whether the environment is vectorised or not.
- Parameters:
*args –
Variable length argument list. Contains batched or unbatched transition elements in consistent order, e.g. states, actions, rewards, next_states, dones
is_vectorised (bool) – Boolean flag indicating if the environment has been vectorised
- save2memorySingleEnv(*args)¶
Saves experience to memory.
- Parameters:
*args –
Variable length argument list. Contains transition elements in consistent order, e.g. state, action, reward, next_state, done
- save2memoryVectEnvs(*args)¶
Saves multiple experiences to memory.
- Parameters:
*args –
Variable length argument list. Contains transition elements in consistent order, e.g. state, action, reward, next_state, done
- class agilerl.components.replay_buffer.PrioritizedReplayBuffer(action_dim, memory_size, field_names, num_envs, alpha=0.6, n_step=1, gamma=0.99, device=None)¶
The Prioritized Experience Replay Buffer class. Used to store experiences and allow off-policy learning.
- Parameters:
action_dim (int) – Action dimension
memory_size (int) – Maximum length of replay buffer
field_names (list[str]) – Field names for experience named tuple, e.g. [‘state’, ‘action’, ‘reward’]
num_envs (int) – Number of parallel environments for training
alpha (float, optional) – Alpha parameter for prioritized replay buffer, defaults to 0.6
n_step (int, optional) – Step number to calculate n-step td error, defaults to 1
gamma (float, optional) – Discount factor, defaults to 0.99
device (str, optional) – Device for accelerated computing, ‘cpu’ or ‘cuda’, defaults to None
- sample(batch_size, beta=0.4)¶
Returns sample of experiences from memory.
- Parameters:
batch_size (int) – Number of samples to return
- sample_from_indices(idxs)¶
Returns sample of experiences from memory using provided indices.
- save2memory(*args, is_vectorised=False)¶
Applies appropriate save2memory function depending on whether the environment is vectorised or not.
- Parameters:
*args –
Variable length argument list. Contains batched or unbatched transition elements in consistent order, e.g. states, actions, rewards, next_states, dones
is_vectorised (bool) – Boolean flag indicating if the environment has been vectorised
- save2memorySingleEnv(*args)¶
Saves experience to memory.
- Parameters:
*args –
Variable length argument list. Contains transition elements in consistent order, e.g. state, action, reward, next_state, done
- save2memoryVectEnvs(*args)¶
Saves multiple experiences to memory.
- Parameters:
*args –
Variable length argument list. Contains transition elements in consistent order, e.g. state, action, reward, next_state, done
- update_priorities(idxs, priorities)¶
Update priorities of sampled transitions.