Designing Search Autocomplete

Search autocomplete looks simple but has subtle requirements: sub-50ms latency, billions of queries served, real-time index updates, and personalization. The core algorithm is a trie (prefix tree) augmented with frequency scores and per-user re-ranking.

Advertisement

The trie

Tree where each node represents a character. Path from root to node = a query prefix. Each node stores the top-K most popular completions for that prefix. Lookup is O(prefix_length). At 100M completions, fits in ~10GB RAM.

Frequency updates

Every search query increments a counter for the matched completion. Periodically (every minute), rebuild the top-K lists at each trie node. Real-time isn't needed — humans don't notice 60-second-old autocomplete suggestions.

Advertisement

Personalization

Re-rank the top-K based on user signals: recent searches, click-through history, location. Cheap: load user vector at session start, dot-product against completion embeddings at query time. Adds ~5ms latency.

Geographic + locale variants

Trie per (country, language). 'pizza near' suggests differently in NYC vs Mumbai. Don't share tries across locales; the suggestions become noise. Heavy memory cost — but per-locale tries can be sharded across servers by hash.

Typo tolerance

Append edit-distance search for queries returning < N results. BK-tree or a 4-gram inverted index handles single typos in O(log N). Don't run on every query — only when the trie lookup is sparse.

Trie + per-node top-K + per-locale shards + user-signal rerank + typo fallback. 50ms p99 is achievable at billion-query scale.