Change Data Capture (CDC) streams every row change in a database to a downstream system. Used for: cache invalidation, materialized views, downstream analytics, cross-system replication. By 2026 the tooling is robust but the architectural choices still matter.

Advertisement

Log-based CDC — the right way

Tail the database's write-ahead log (Postgres WAL, MySQL binlog, Mongo oplog). Every committed change emitted as an event. Atomicity preserved. No polling overhead. Standard via Debezium, which connects to most databases.

Polling-based CDC — the legacy way

SELECT * FROM table WHERE updated_at > X. Misses deletes. Adds load to the source DB. Time-window quirks if the clock isn't sync. Don't use; the log-based connectors exist.

Advertisement

Debezium ecosystem

Open-source Kafka Connect-based connectors for Postgres, MySQL, Mongo, SQL Server, Oracle. Robust, mature, complex to operate. The community standard. Hosted versions: Confluent, Aiven, Decodable.

Native database CDC

Some databases ship native CDC: Postgres logical replication (no Debezium needed), Spanner change streams, CockroachDB CDC, Cassandra CDC. Simpler to operate when available; less ecosystem than Debezium.

Operational patterns

Initial snapshot + ongoing replication. Schema evolution propagated via Avro/Protobuf in the stream. Tombstones for deletes. Watch slot retention on Postgres (CDC consumer behind = WAL piles up = disk fills). Monitor lag.

Log-based CDC via Debezium or native. Avoid polling. Watch slot/WAL retention. Track lag.