Advertisement
logits → softmax(/T) → distribution → pick (argmax or sample).
What you're seeing
Final hidden state × W_out → logits ∈ ℝ^V. Apply temperature, softmax, then pick.
Greedy: deterministic, can repeat. Sampling: diverse, can be incoherent at high T.
★ KEY TAKEAWAY
logits → softmax(/T) → distribution → pick (argmax or sample). Temperature is the main creativity knob.
▶ WHAT TO TRY
- Switch between Greedy and Sample.
- Drop temperature to 0.1 — greedy and sample agree.
- Raise to 2.0 — sample becomes diverse, may pick low-prob tokens.