LayerNorm Statistics — Belgavi.AI Lab

Advertisement

Computes mean and variance ACROSS feature dim. Per-token.

What you're seeing

For each token: subtract mean, divide by std. Then apply learnable γ (scale) and β (shift). Shown as raw input, mean line, variance band, normalized output.

★ KEY TAKEAWAY

LayerNorm: subtract mean, divide by std, then learnable scale + shift. Per-token; no batch dependence.

▶ WHAT TO TRY

Click Resample to see how arbitrary inputs always end up with mean=0, std=1 after normalization.
The learned γ (scale) and β (shift) let the model un-normalize where useful.