Top-k vs Top-p (Nucleus) Sampling

Advertisement

Top-k 5 Top-p 0.90

Top-k keeps fixed K candidates. Top-p keeps a variable set whose probability sums to p.

What you're seeing

Top-k: keep only the top K most-likely tokens; renormalize. Simple and fast. Awkward when the distribution is very flat (K cuts off too much) or very peaked (K includes garbage).

Top-p (nucleus): keep the smallest set whose cumulative probability ≥ p. Adapts to the distribution shape — keeps fewer tokens when peaked, more when flat. The 2026 default.

★ KEY TAKEAWAY

Top-k keeps a fixed count; top-p keeps a variable set with cumulative prob ≥ p. Top-p adapts to distribution shape.

▶ WHAT TO TRY

Slide both k and p — see how the kept set differs.
Green = kept by both. Red = top-k only. Yellow = top-p only.