▶ Interactive Lab

CPU Training Memory Calculator

Adjust model size; see RAM needed.

Advertisement
Total RAM = weights + gradients + optimizer + activations.

What you're seeing

Per-param: FP32 = 16 bytes (weights+grad+m+v). BF16 mix ≈ 10 bytes. Activations scale with d·L·seq.

★ KEY TAKEAWAY
CPU training memory = weights × ~4 (FP32+AdamW) + activations. 350M fits in 16GB; 1B needs 64GB.
▶ WHAT TO TRY
  • Slide Params from 50M to 3B to see the memory breakdown.
  • Toggle BF16 mixed — halves activation memory.
  • Toggle Grad checkpoint — saves ~70% activation memory at 33% extra compute.