Advertisement
Full ΔW: d×d params. LoRA A·B: 2·d·r params. Savings = d²/(2dr) = d/(2r).
What you're seeing
r=8-32 captures most fine-tuning gains. Up to ~250× fewer trainable params.
★ KEY TAKEAWAY
LoRA: replace full ΔW (d×d) with A·B (d×r and r×d). Up to 250× fewer trainable params. Standard fine-tuning recipe in 2026.
▶ WHAT TO TRY
- Slide r down to 4 — see how few params get trained.
- Up to r=64 captures most fine-tuning gains; diminishing returns after.