Advertisement
Sinusoidal (original) adds position to embeddings. RoPE (modern) rotates Q/K vectors.
What you're seeing
Sinusoidal: original Transformer paper. Add fixed sin/cos of different frequencies to the token embedding before attention.
RoPE: rotate Q and K vectors by a position-dependent angle. The dot product naturally encodes relative position. Better extrapolation, used by Llama, Mistral, every modern LLM.
★ KEY TAKEAWAY
Sinusoidal adds a fixed pattern. RoPE rotates Q/K. Both inject position info; RoPE extrapolates better.
▶ WHAT TO TRY
- Switch sinusoidal vs RoPE.
- Different dimension indices have different frequencies — the heatmap shows this.