▶ Interactive Lab

Complete Transformer Block

Animate data flowing through pre-norm + attention + residual + pre-norm + FFN + residual.

Advertisement
A modern transformer block: 6 stages from input to output.

What you're seeing

Pre-norm architecture: norm first, then sub-block, then residual add. Repeat for FFN. Same pattern at every layer; L layers stacked = full transformer.

★ KEY TAKEAWAY
Every modern transformer block is pre-norm + attention + residual + pre-norm + FFN + residual. Same pattern, L times.
▶ WHAT TO TRY
  • Click Next stage to walk through one block in sequence.
  • This pattern is used in Llama, Mistral, Phi, Qwen, Gemma — all of them.