Saving and Loading LLM Checkpoints

LLM checkpoints in AgileRL can persist just LoRA adapters, the full model, and optionally the optimizer/LR-scheduler state, with separate code paths for plain (single-process) training and distributed training via DeepSpeed + Accelerate. The defaults are lora_only=True and save_optimizer=True.

Checkpoint layout on disk

A typical checkpoint directory written by save_checkpoint() looks like:

checkpoint_dir/
├── attributes.pt              # algorithm hyperparameters; may also
│                              # contain the actor state_dict and/or
│                              # optimizer state depending on flags
├── actor/
│   ├── adapter_model.safetensors
│   └── adapter_config.json
├── reference/                 # only if use_separate_reference_adapter=True
│   ├── adapter_model.safetensors
│   └── adapter_config.json
├── critic/                    # only for algorithms with a value head
│   ├── adapter_model.safetensors
│   └── adapter_config.json
└── save_checkpoint/           # DeepSpeed sharded checkpoint; only when
                               # training with an Accelerator

Which adapter subdirectories appear depends on the algorithm:

  • SFT: actor only.

  • DPO, GRPO: actor + reference.

  • PPO-LLM (with value head): actor + reference + critic.

Saving

agent.save_checkpoint(
    path,
    lora_only=True,        # default: adapters only, no base weights
    save_optimizer=True,   # default: persist optimizer + LR scheduler
)

The four combinations on the non-distributed path:

lora_only

save_optimizer

Produces

True

True

Adapter dirs on disk; optimizer state inside attributes.pt.

True

False

Adapter dirs only. No optimizer state.

False

True

Full actor state_dict + optimizer state, both inside attributes.pt.

False

False

Full actor state_dict inside attributes.pt (no optimizer state).

On the DeepSpeed path, save_optimizer=True writes a sharded checkpoint into <path>/save_checkpoint/ via the engine instead of bundling optimizer state into attributes.pt. lora_only=True still writes adapter directories. The lora_only=False, save_optimizer=False cell gathers ZeRO-3 shards and injects the full state_dict into attributes.pt.

Common scenarios:

# Periodic snapshot during training (adapters + optimizer, so training
# can resume where it left off):
agent.save_checkpoint(path)

# Release a deployable artefact (adapters only, no training state):
agent.save_checkpoint(path, save_optimizer=False)

# Persist the full model, base weights included, not just the adapters
# (e.g. for hand-off to a consumer that can't re-download the base):
agent.save_checkpoint(path, lora_only=False, save_optimizer=False)

Loading

agent.load_checkpoint(
    path,
    load_optimizer=True,   # default: restore optimizer + LR scheduler
)

save_optimizer and load_optimizer are independent flags: you can load a checkpoint that contains optimizer state while passing load_optimizer=False to keep the live optimizer, or load a weights-only checkpoint with load_optimizer=True (in which case a UserWarning is emitted and the existing optimizer is kept as-is).

load_checkpoint() expects the live algorithm to already be configured against the same base model. It restores adapter weights on top of that base and, by default, copies the just-loaded actor adapter onto reference so that SFT → DPO → GRPO pipelines work out of the box: the actor trained in stage N becomes the reference for stage N+1.

The checkpoint’s LoRA config must match the live algorithm’s (rank, target modules, etc.); a mismatch raises ValueError. Re-create the agent with the checkpoint’s LoRA config to load it.

Common scenarios:

# Resume training:
agent.load_checkpoint(path)

# Inference / evaluation with a checkpoint that may or may not contain
# optimizer state, which we don't need:
agent.load_checkpoint(path, load_optimizer=False)

DeepSpeed and Accelerate

When an Accelerator with a DeepSpeedPlugin is attached, the save/load paths differ as follows:

  • save_optimizer=True delegates to the DeepSpeed engine’s own sharded checkpoint format, written to <path>/save_checkpoint/. The matching load path reads the same directory.

  • save_optimizer=False falls back to the PEFT / torch-save path, which produces the same adapter directories / attributes.pt as plain training.

  • ZeRO-3 sharded parameters are gathered via the appropriate gather context before being written, so the on-disk layout is identical regardless of ZeRO stage.

Multi-process correctness (only the main process writes attributes.pt, followed by accelerator.wait_for_everyone()) is handled internally; you call save_checkpoint() / load_checkpoint() the same way whether you’re on one GPU or many.