▶ Interactive Lab

RMSNorm vs LayerNorm — Side by Side

See the difference: RMSNorm skips mean centering.

Advertisement
LayerNorm: (x - mean) / std. RMSNorm: x / RMS. One stat instead of two.

What you're seeing

Empirically: dropping mean centering barely affects quality but saves arithmetic. Every recent open LLM (Llama, Mistral, Phi) uses RMSNorm.

★ KEY TAKEAWAY
RMSNorm = LayerNorm minus mean centering. ~10–15% faster, same quality, modern default for Llama/Mistral/Phi.
▶ WHAT TO TRY
  • Click Resample on inputs with non-zero mean.
  • LayerNorm centers them to mean=0. RMSNorm keeps the mean.
  • Empirically, this difference doesn't hurt quality on transformer-shaped models.