Distributed caches like Memcached and Redis Cluster shard data across nodes and survive node failures. The design questions are: how to partition keys, how to replicate, how to evict, and how to add capacity without massive cache misses.
Consistent hashing
Naïve sharding (hash(key) % N) means adding one node remaps ALL keys → cache miss storm. Consistent hashing places nodes on a hash ring at multiple virtual positions; adding a node remaps only 1/N of keys to it. Virtual nodes (~100 per physical node) smooth the load.
Replication
Two patterns. Primary-replica (Redis Cluster): each shard has a primary + N replicas. Reads can hit replicas. Multi-primary (Cassandra-style): every node accepts writes, conflicts resolved by timestamp. For cache use cases, primary-replica is simpler and almost always sufficient.
Eviction policies
LRU: least recently used. Default for Memcached. Good for general workloads. LFU: least frequently used. Better when access patterns are heavy-tailed (~80/20). TTL-only: no eviction, just expiry. Use when you can size for the working set.
Cache stampede protection
100 servers simultaneously miss a key, all 100 hit the DB. Fix: locked recompute. The first miss writes a 'computing' marker; other misses for the same key wait or serve stale. Redis: SET key value NX EX 60 for the lock.
Adding capacity safely
Don't repartition under load. Use a 'shadow' phase: dual-write new + old shards while reads still hit old. Compare results, switch reads, then drop old shard. Migration takes minutes-hours but cache miss rate stays at baseline.