How

Compute set of tokens legal at current state (per grammar). Set logits of others to -∞. Sample from remainder.

Advertisement

Libraries

Outlines, guidance, LMQL, llama.cpp grammar mode, vLLM guided decoding. All expose regex/JSON Schema/CFG.

Advertisement

Speed

Cheap: grammar step is negligible per token. But: grammar can force model into low-quality tokens (mode collapse).