Change Data Capture (CDC) turns your database's transaction log into a Kafka topic. Every INSERT, UPDATE, DELETE becomes an event downstream consumers can react to in near-real-time — without polling the database. Debezium is the open-source reference implementation.
How it taps the log
Debezium reads the database's binary log directly: MySQL binlog, Postgres WAL via logical replication, MongoDB oplog. No triggers, no polling, no application changes. The source DB sees a single replication client.
Event shape
{
"op": "u", // c=create, u=update, d=delete, r=read
"ts_ms": 1719388800000,
"source": { "db": "orders_db", "table": "orders", "txId": 42 },
"before": { "id": 7, "status": "pending" },
"after": { "id": 7, "status": "shipped" }
}Schema evolution
Debezium integrates with Confluent Schema Registry by default. ALTER TABLE results in a new schema version; downstream consumers using Avro deserializers handle backward-compatible changes automatically. Breaking changes (column rename, type change) require coordination.
Production gotchas
Initial snapshot of a large table can take hours and blocks WAL recycling — schedule for low-traffic windows. Long-running consumer lag pins the WAL — alert on it. For Postgres, set max_wal_senders and use a dedicated replication slot.
Use cases
Search index sync (Postgres → Elasticsearch). Cache invalidation (DB → Redis). Audit log. Multi-region replication. Event-driven microservices (each consumer reacts to DB changes). CDC is the lowest-coupling way to add async pipelines without app code changes.