▶ Interactive Lab

SLM Parameter Breakdown

Where the parameters live: embedding, attention, FFN.

Advertisement
FFN dominates. Attention 2nd. Embedding negligible for big models.

What you're seeing

For d=2048, L=24: ~600M params, FFN ~60%, attention ~25%, embedding ~10% (tied).

★ KEY TAKEAWAY
FFN holds ~60% of transformer params. Attention ~25%. Embedding ~10% (tied) or ~20% (untied). For SLMs, embedding share matters more.
▶ WHAT TO TRY
  • Slide d down to see embedding share grow.
  • Toggle Tied to halve the embedding cost.