Designing a Distributed Cache (Memcached/Redis-style)

Distributed caches like Memcached and Redis Cluster shard data across nodes and survive node failures. The design questions are: how to partition keys, how to replicate, how to evict, and how to add capacity without massive cache misses.

Advertisement

Consistent hashing

Naïve sharding (hash(key) % N) means adding one node remaps ALL keys → cache miss storm. Consistent hashing places nodes on a hash ring at multiple virtual positions; adding a node remaps only 1/N of keys to it. Virtual nodes (~100 per physical node) smooth the load.

Replication

Two patterns. Primary-replica (Redis Cluster): each shard has a primary + N replicas. Reads can hit replicas. Multi-primary (Cassandra-style): every node accepts writes, conflicts resolved by timestamp. For cache use cases, primary-replica is simpler and almost always sufficient.

Advertisement

Eviction policies

LRU: least recently used. Default for Memcached. Good for general workloads. LFU: least frequently used. Better when access patterns are heavy-tailed (~80/20). TTL-only: no eviction, just expiry. Use when you can size for the working set.

Cache stampede protection

100 servers simultaneously miss a key, all 100 hit the DB. Fix: locked recompute. The first miss writes a 'computing' marker; other misses for the same key wait or serve stale. Redis: SET key value NX EX 60 for the lock.

Adding capacity safely

Don't repartition under load. Use a 'shadow' phase: dual-write new + old shards while reads still hit old. Compare results, switch reads, then drop old shard. Migration takes minutes-hours but cache miss rate stays at baseline.

Consistent hashing + primary-replica + LRU eviction + stampede locks + shadow-migrate when growing. That's 90% of distributed caches.