▶ Interactive Lab

Gradient Clipping in Action

See spikes get truncated to max_norm.

Advertisement
Most steps: norm < max_norm, no clipping. Occasional spike → clip to max_norm.

What you're seeing

Plot of gradient norm per step. Clip when above max_norm (red). Below: pass-through (green).

★ KEY TAKEAWAY
Gradient norm clipping caps spike-induced blow-ups. max_norm=1 is the standard for LLM training.
▶ WHAT TO TRY
  • Slide max_norm low — see lots of clipping (red bars truncated).
  • Set it very high — spikes get through and would derail training.
  • Click Simulate to generate a new sequence of gradient norms with rare spikes.