Advertisement
Different strategies for stretching RoPE beyond trained context.
What you're seeing
Linear: divide positions. NTK-aware: scale frequency base. YaRN: NTK + temperature.
★ KEY TAKEAWAY
RoPE extension stretches a model trained on short context to longer. NTK-aware and YaRN preserve low-freq info; linear PI loses it.
▶ WHAT TO TRY
- Switch between extension strategies — see how the RoPE angle pattern changes past training length.
- Phi-3's LongRope is the SOTA for extreme lengths (128K+).