Setup

Curate 100-10k (input, output) example bank. Embed inputs. Store in vector DB. At inference: embed query → retrieve top-K → include in prompt.

Advertisement

Diversity

Naive kNN picks similar → less diverse. MMR (Maximal Marginal Relevance) balances similarity + diversity. Better generalization.

Advertisement

Cost

Extra embedding + retrieval per query. Small vs LLM call. Retrieval < 50ms typically.