In a large decentralized cluster like Cassandra, maintaining a global state (which nodes are up, down, or joining) without a central master is tricky. This is where the Gossip Protocol (or Epidemic Protocol) shines.
Imagine a cocktail party. You tell a rumor to 3 random friends. In the next round, they tell 3 random friends. Exponentially, the entire party knows the rumor very quickly. Distributed systems use this mechanism to propagate state updates.
Every second (or configurable interval), a node selects a few random peer nodes and exchanges information. They compare their knowledge of the cluster state (using version numbers or vector clocks) and merge the diffs.
Cassandra uses Gossip not just for state but for failure detection. Instead of a binary "up/down", it calculates a probability ($\Phi$) that a node is down based on the history of heartbeat intervals. This adapts to network latency automatically.
Below is a simulation of Gossip infection. Click "Start Gossip" to see how a single piece of information spreads through the cluster.