Advertisement
Each step: forward pass on current sequence; append sampled token.
What you're seeing
With KV cache: only the new token's K, V are computed each step. Past tokens reused from cache.
★ KEY TAKEAWAY
Autoregressive generation: predict next token, append, repeat. KV cache means each step is O(seq·d) not O(seq²).
▶ WHAT TO TRY
- Click Generate next token repeatedly.
- Notice that 'cached' tokens (gray) aren't re-processed.
- Only the newest token (green) is computed each step.