▶ Interactive Lab

Tied Embeddings Savings

Untied vs tied params for SLMs.

Advertisement
Tied: lm_head = embedding.T. Saves V·d params.

What you're seeing

For SLMs (small d), the saving is significant. For 70B models, often kept untied.

★ KEY TAKEAWAY
Tied embeddings: lm_head = embedding.T. Saves V·d params (often ~10% of an SLM). Standard for small models.
▶ WHAT TO TRY
  • Slide vocab V from 8K to 200K — bigger vocab = bigger savings.
  • Llama keeps them untied (has params to spare); Phi/Qwen/Gemma tie them.