Kafka shipped without external ZooKeeper for the first time in version 2.8 (KRaft mode, GA in 3.3). Removing ZooKeeper simplifies operations and improves controller failover time. Migrating a production cluster is non-trivial but the gains are large.

Advertisement

What ZooKeeper did

Stored cluster metadata: which brokers exist, topic configurations, partition assignments, controller election. Required a separate ZooKeeper ensemble (3 or 5 nodes). One more thing to monitor, patch, secure, and scale.

How KRaft replaces it

Metadata is stored in an internal Kafka topic __cluster_metadata, replicated via Raft. A subset of brokers (the controllers) form the Raft quorum. No external dependency. Controller failover time drops from ~10s to ~1s.

Advertisement

Migration path (3.5+)

1. Provision a KRaft-mode controller quorum alongside existing ZK-mode cluster. 2. Set zookeeper.metadata.migration.enable=true on brokers. 3. Migration tool copies ZK state into KRaft. 4. Rolling restart brokers in KRaft mode. 5. Retire ZooKeeper ensemble.

Operational differences

Backup: snapshot the metadata log rather than ZK dump. Monitoring: Kafka's existing JMX metrics expose controller health. Recovery: a corrupted metadata log means restoring from snapshot + replaying log entries — same shape as topic recovery.

When to migrate

New clusters: start in KRaft mode (default in 4.x). Existing clusters with > 10 brokers and active operations team: migrate in scheduled window. Small or no-on-call clusters: wait for fully automated migration tooling (in active development).

KRaft removes ZooKeeper as a dependency and improves controller failover. Migrate when you have on-call coverage; default for new clusters.