Search autocomplete looks simple but has subtle requirements: sub-50ms latency, billions of queries served, real-time index updates, and personalization. The core algorithm is a trie (prefix tree) augmented with frequency scores and per-user re-ranking.
The trie
Tree where each node represents a character. Path from root to node = a query prefix. Each node stores the top-K most popular completions for that prefix. Lookup is O(prefix_length). At 100M completions, fits in ~10GB RAM.
Frequency updates
Every search query increments a counter for the matched completion. Periodically (every minute), rebuild the top-K lists at each trie node. Real-time isn't needed — humans don't notice 60-second-old autocomplete suggestions.
Personalization
Re-rank the top-K based on user signals: recent searches, click-through history, location. Cheap: load user vector at session start, dot-product against completion embeddings at query time. Adds ~5ms latency.
Geographic + locale variants
Trie per (country, language). 'pizza near' suggests differently in NYC vs Mumbai. Don't share tries across locales; the suggestions become noise. Heavy memory cost — but per-locale tries can be sharded across servers by hash.
Typo tolerance
Append edit-distance search for queries returning < N results. BK-tree or a 4-gram inverted index handles single typos in O(log N). Don't run on every query — only when the trie lookup is sparse.