▶ Interactive Lab

LayerNorm Statistics

Watch mean, variance, and normalized output for a tensor.

Advertisement
Computes mean and variance ACROSS feature dim. Per-token.

What you're seeing

For each token: subtract mean, divide by std. Then apply learnable γ (scale) and β (shift). Shown as raw input, mean line, variance band, normalized output.

★ KEY TAKEAWAY
LayerNorm: subtract mean, divide by std, then learnable scale + shift. Per-token; no batch dependence.
▶ WHAT TO TRY
  • Click Resample to see how arbitrary inputs always end up with mean=0, std=1 after normalization.
  • The learned γ (scale) and β (shift) let the model un-normalize where useful.