Advertisement
Top-k keeps fixed K candidates. Top-p keeps a variable set whose probability sums to p.
What you're seeing
Top-k: keep only the top K most-likely tokens; renormalize. Simple and fast. Awkward when the distribution is very flat (K cuts off too much) or very peaked (K includes garbage).
Top-p (nucleus): keep the smallest set whose cumulative probability ≥ p. Adapts to the distribution shape — keeps fewer tokens when peaked, more when flat. The 2026 default.
★ KEY TAKEAWAY
Top-k keeps a fixed count; top-p keeps a variable set with cumulative prob ≥ p. Top-p adapts to distribution shape.
▶ WHAT TO TRY
- Slide both k and p — see how the kept set differs.
- Green = kept by both. Red = top-k only. Yellow = top-p only.