How
Compute set of tokens legal at current state (per grammar). Set logits of others to -∞. Sample from remainder.
Advertisement
Libraries
Outlines, guidance, LMQL, llama.cpp grammar mode, vLLM guided decoding. All expose regex/JSON Schema/CFG.
Advertisement
Speed
Cheap: grammar step is negligible per token. But: grammar can force model into low-quality tokens (mode collapse).