In distributed systems like Cassandra and DynamoDB, partitioning data evenly across a cluster is a fundamental challenge. A naive approach using hash(key) % N (where N is the number of nodes) works fine until you need to add or remove a node. When N changes, almost every key maps to a new node, causing massive data reshuffling.
Imagine a cluster with 4 nodes. Key 'A' hashes to 10. 10 % 4 = 2, so it goes to Node 2. If we add a 5th node, 10 % 5 = 0. The key moves to Node 0. In fact, nearly N/(N+1) of keys will move.
Consistent Hashing solves this by decoupling the data partition from the number of physical nodes. It maps both data and nodes onto a circular "ring" (conceptually 0 to 2^32 or similar).
Basic consistent hashing can still lead to "hot spots" if nodes aren't evenly spaced or if one node is more powerful. Virtual Nodes fix this by assigning multiple tokens to a single physical node. This spreads the load more evenly and speeds up rebalancing.
Interactive Visualization: Add/Remove nodes to see how keys redistribute.