Designing Search at Scale — Belgavi.AI Lab

'Build search' looks simple until you ship it. The architecture has at least four distinct services: indexer, query parser, retrieval, ranking. Personalization adds a fifth. Each layer has its own scaling story.

Advertisement

Indexing pipeline

Source changes (DB CDC, doc upload) → enrichment (extract, embed) → indexer → search backend (Elastic, OpenSearch, Vespa). Eventual consistency; new content shows up in seconds to minutes.

Retrieval vs ranking

Retrieval: get candidate set (top 1000) from index. Ranking: re-rank candidates with richer features (user signals, ML scores). Often separate services; ranking is the slower, more iterative side.

Advertisement

Freshness vs quality

Real-time index (every doc indexed immediately): great freshness, slower queries. Bulk index (daily): faster queries, stale results. Hybrid: real-time for hot content, bulk for archive. Most production systems land here.

Pipeline + retrieve + rank + freshness tier. Four pieces; each can scale independently.