Change Data Capture (CDC)

Your order service writes an order to PostgreSQL. A separate search service needs that order in Elasticsearch so users can search their order history. A cache service needs it in Redis for fast lookups. An analytics pipeline needs it in the data warehouse for reporting. The naive approach: the order service publishes an event to Kafka after every database write. But what happens when the database write succeeds and the Kafka publish fails? The order exists in the database but is invisible to search, cache, and analytics. Or worse — the Kafka publish succeeds but the database transaction rolls back. Now search shows an order that doesn’t exist. You can’t reliably keep two systems in sync by writing to both from application code. CDC solves this by reading changes directly from the database’s own replication log — the single source of truth — and streaming them to downstream systems.

The Problem: Dual Writes Are Unreliable

The “dual write” problem appears whenever application code tries to update two systems atomically:

sequenceDiagram
    participant App as Order Service
    participant DB as PostgreSQL
    participant K as Kafka

    App->>DB: INSERT INTO orders (...)
    DB-->>App: OK (committed)

    App->>K: Publish OrderCreated event
    Note over App,K: Network timeout — did Kafka receive it?

    alt Kafka received it
        Note over K: Event published ✓
But app doesn't know → might retry → duplicate
    else Kafka didn't receive it
        Note over K: Event lost ✗
DB has the order, Kafka doesn't
Search/cache/analytics never learn about it
    end

Failure scenarios for dual writes:

  1. DB succeeds, Kafka fails:
     → Order in DB, not in search/cache/analytics
     → Data inconsistency until someone notices

  2. DB fails, Kafka succeeds:
     → Event published for an order that doesn't exist
     → Search shows a phantom order

  3. DB succeeds, Kafka succeeds, but app crashes before confirming:
     → App retries on restart → duplicate event published
     → Downstream systems process the order twice

  4. DB succeeds, app publishes to Kafka, then DB transaction rolls back:
     → Kafka has an event for a rolled-back transaction
     → This is the worst case — extremely hard to detect

The outbox pattern (covered in the Outbox Pattern post) solves this by writing the event to an outbox table in the same database transaction. But someone still needs to read the outbox table and publish to Kafka. This is exactly what CDC does — and it can do it for any table, not just the outbox.

What CDC Does

Change Data Capture reads the database’s internal transaction log (MySQL binlog, PostgreSQL WAL, MongoDB oplog) and streams every row-level change (INSERT, UPDATE, DELETE) as a structured event to a downstream system — typically Kafka.

flowchart LR
    subgraph "Source Database"
        App([Application]) -->|"INSERT/UPDATE/DELETE"| DB[(PostgreSQL)]
        DB -->|"every write goes to"| WAL[Write-Ahead Log
pg_wal]
    end

    subgraph "CDC Connector"
        Deb[Debezium
reads WAL position
converts to events]
    end

    WAL --> Deb

    subgraph "Event Stream"
        K[Kafka Topic
db.public.orders]
    end

    Deb --> K

    subgraph "Consumers"
        ES[Elasticsearch
search index]
        Redis[Redis
cache]
        DW[Data Warehouse
analytics]
    end

    K --> ES & Redis & DW

Why reading the transaction log is reliable:

The transaction log is how the database itself ensures durability and replication. Every committed transaction is in the log. Every rolled-back transaction is not. By reading the log, CDC captures exactly the committed changes — no duplicates from retries, no phantom events from rolled-back transactions, no missed writes from failed publishes.

Application writes to DB → DB writes to WAL (for its own durability)
                          → CDC reads WAL (piggybacks on the DB's own mechanism)
                          → CDC publishes to Kafka

The application doesn't need to publish anything — CDC extracts events
from the database's internal log automatically.

How Database Logs Work

PostgreSQL: Write-Ahead Log (WAL)

PostgreSQL writes every change to the WAL before applying it to data pages. This is how it survives crashes — on recovery, it replays the WAL to reconstruct committed transactions.

CDC uses logical replication (PostgreSQL 10+) to read the WAL as structured change events:

WAL entry for an INSERT:

  LSN: 0/1A3B4C0    (Log Sequence Number — position in the WAL)
  Transaction: 42857
  Relation: public.orders
  Operation: INSERT
  New tuple: {
    id: 12345,
    user_id: 42,
    amount: 49.99,
    status: "pending",
    created_at: "2026-04-25T10:30:00Z"
  }

-- Enable logical replication in PostgreSQL
-- postgresql.conf:
--   wal_level = logical
--   max_replication_slots = 4

-- Create a replication slot for Debezium
SELECT pg_create_logical_replication_slot('debezium_slot', 'pgoutput');

-- Debezium connects as a replication client and reads from this slot
-- The slot tracks how far Debezium has read — no data loss if Debezium restarts

MySQL: Binary Log (binlog)

MySQL’s binlog records every data-modifying statement (or row change in ROW format). CDC reads the binlog in the same way a MySQL replica would.

binlog entry for an UPDATE:

  Position: mysql-bin.000042:1847
  Event: UPDATE_ROWS
  Table: ecommerce.orders
  Before: {id: 12345, status: "pending", amount: 49.99}
  After:  {id: 12345, status: "shipped", amount: 49.99}

MongoDB: Oplog

MongoDB’s oplog (operation log) is a capped collection that records every write operation. CDC tails this like a Kafka consumer tails a topic.

Debezium: The Standard CDC Tool

Debezium is the most widely used open-source CDC platform. It runs as a Kafka Connect connector — a plugin in the Kafka Connect framework that continuously streams database changes to Kafka topics.

flowchart TB
    subgraph "Source Databases"
        PG[(PostgreSQL
WAL)]
        MY[(MySQL
binlog)]
        MO[(MongoDB
oplog)]
    end

    subgraph "Kafka Connect Cluster"
        DC1[Debezium Connector
PostgreSQL source]
        DC2[Debezium Connector
MySQL source]
        DC3[Debezium Connector
MongoDB source]
    end

    PG --> DC1
    MY --> DC2
    MO --> DC3

    subgraph "Kafka"
        T1[pg-server.public.orders]
        T2[mysql-server.ecommerce.users]
        T3[mongo-server.db.products]
    end

    DC1 --> T1
    DC2 --> T2
    DC3 --> T3

Debezium Event Structure

Every change event contains the before state, the after state, the operation type, and metadata:

{
  "schema": { ... },
  "payload": {
    "before": {
      "id": 12345,
      "status": "pending",
      "amount": 49.99
    },
    "after": {
      "id": 12345,
      "status": "shipped",
      "amount": 49.99
    },
    "source": {
      "connector": "postgresql",
      "db": "ecommerce",
      "schema": "public",
      "table": "orders",
      "lsn": 27396632,
      "txId": 42857,
      "ts_ms": 1714041000000
    },
    "op": "u",
    "ts_ms": 1714041000123
  }
}

Field	Value	Meaning
`op`	`"c"`	CREATE (INSERT)
`op`	`"u"`	UPDATE
`op`	`"d"`	DELETE
`op`	`"r"`	READ (initial snapshot)
`before`	Previous row state	`null` for INSERTs
`after`	New row state	`null` for DELETEs
`source.lsn`	Log Sequence Number	Position in the WAL — enables exactly-once tracking

Debezium Connector Configuration

{
  "name": "orders-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "db-primary.internal",
    "database.port": "5432",
    "database.user": "debezium",
    "database.password": "${secrets.db_password}",
    "database.dbname": "ecommerce",
    "topic.prefix": "pg-server",
    "table.include.list": "public.orders,public.users",
    "slot.name": "debezium_orders",
    "plugin.name": "pgoutput",
    "snapshot.mode": "initial",
    "transforms": "route",
    "transforms.route.type": "io.debezium.transforms.ByLogicalTableRouter",
    "transforms.route.topic.regex": "(.*)orders(.*)",
    "transforms.route.topic.replacement": "order-events"
  }
}

Initial Snapshot

When a CDC connector starts for the first time, the existing data in the table hasn’t been through the log yet. Debezium performs an initial snapshot — it reads the entire table and emits each row as a "r" (read) event. After the snapshot completes, it switches to streaming mode and reads only new changes from the WAL.

Startup sequence:
  1. Lock table briefly to get a consistent snapshot point
  2. Read all existing rows → emit as "r" events to Kafka
  3. Note the current WAL position
  4. Release lock
  5. Start streaming changes from the noted WAL position → "c"/"u"/"d" events

  Result: Kafka topic contains the complete table state + ongoing changes

Use Cases

1. Cache Invalidation

flowchart LR
    App([App writes to DB]) --> DB[(PostgreSQL)]
    DB -->|WAL| Deb[Debezium]
    Deb --> Kafka[Kafka]
    Kafka --> CacheSync[Cache Sync Consumer]
    CacheSync -->|"on INSERT/UPDATE:
SET key value"| Redis[(Redis)]
    CacheSync -->|"on DELETE:
DEL key"| Redis

Problem: Cache-aside pattern requires the app to update Redis explicitly. If the app forgets (bug) or crashes before updating Redis, the cache is stale. Invalidation logic is scattered across every code path that writes to the DB.

CDC solution: A single consumer reads change events from Kafka and updates Redis automatically. On INSERT/UPDATE/snapshot-read events, the consumer writes the row data to a Redis hash keyed by the record’s primary key (with a TTL). On DELETE events, it removes the key. Every database write — from any code path, any service, even manual SQL — flows through the WAL and reaches Redis. No code changes needed in the application.

2. Search Index Sync

Problem: Elasticsearch must reflect database changes within seconds. The app could call the Elasticsearch API after every DB write, but that’s another dual-write problem.

CDC solution: Debezium streams changes to Kafka. An Elasticsearch sink connector (or a custom consumer) reads events and indexes documents on INSERT/UPDATE, and deletes documents on DELETE. The consumer maps the CDC event’s after payload to an Elasticsearch document, using the row’s primary key as the document ID.

3. Outbox Pattern with CDC

The cleanest combination: the application writes business data + an outbox event in a single database transaction. Debezium reads the outbox table and publishes the event to Kafka. No polling, no dual writes.

flowchart LR
    subgraph "Single DB Transaction"
        App([Order Service]) -->|"1. INSERT order"| DB[(PostgreSQL)]
        App -->|"2. INSERT outbox event"| DB
    end

    DB -->|"WAL includes
outbox INSERT"| Deb[Debezium
reads outbox table only]
    Deb -->|"publish event"| Kafka[Kafka
order-events topic]

    Kafka --> S1[Search Service]
    Kafka --> S2[Notification Service]
    Kafka --> S3[Analytics Pipeline]

The application writes the business data (e.g., an order row) and an outbox event row in a single database transaction. Debezium is configured to capture only the outbox_events table and uses the Outbox Event Router transform to route events to the correct Kafka topic based on the aggregate_type field.

4. Data Lake / Warehouse Ingestion

CDC streams database changes to S3 (via Kafka Connect S3 Sink) in Parquet format, building a continuously-updated data lake:

flowchart LR
    DB[(PostgreSQL)] -->|WAL| Deb[Debezium]
    Deb --> Kafka[Kafka]
    Kafka --> S3Sink[Kafka Connect
S3 Sink Connector]
    S3Sink -->|"Parquet files
partitioned by date"| S3[(S3 Data Lake)]
    S3 --> DW[Snowflake / BigQuery
Data Warehouse]

This replaces nightly batch ETL with near-real-time ingestion — the warehouse is minutes behind the production database instead of hours.

Schema Evolution

As the application evolves, database schemas change: columns are added, renamed, or removed. CDC events must handle this gracefully — old consumers receiving new-format events and new consumers receiving old-format events.

The Problem

Week 1: orders table has columns (id, user_id, amount, status)
Week 3: add column "currency" (default: "USD")
Week 5: rename "amount" to "total_amount"

Consumer deployed in Week 1 receives a Week 5 event:
  {id: 123, user_id: 42, total_amount: 49.99, currency: "USD", status: "pending"}
  → Consumer code: event["amount"] → KeyError! Consumer crashes.

Compatibility Rules

Compatibility	Rule	Example
Forward compatible	Old consumers can read new events (new events may have extra fields)	Add a column → old consumers ignore the new field
Backward compatible	New consumers can read old events (old events may lack some fields)	New consumer handles missing `currency` with a default value
Full compatible	Both directions work	Only add optional fields with defaults; never remove or rename

Schema Registry

Debezium integrates with Confluent Schema Registry (or Apicurio) to enforce compatibility rules:

flowchart LR
    Deb[Debezium] -->|"register schema v1"| SR[Schema Registry]
    Deb -->|"publish event
(schema v1 reference)"| Kafka[Kafka]

    Note[Later: column added] -.-> Deb
    Deb -->|"register schema v2
(must be compatible)"| SR
    SR -->|"compatibility check:
v2 backward-compatible with v1? ✓"| SR

    Consumer[Consumer] -->|"fetch schema by ID"| SR
    Consumer -->|"deserialize event
using correct schema version"| Consumer

Consumers should defensively handle schema changes: use .get() with defaults for new fields (forward compatibility), check for renamed fields by trying both old and new names (backward compatibility), and never assume a field exists.

Rule of thumb: Only make additive changes (add columns with defaults, add new tables). Never rename or remove columns in production. If you must, create a new column, backfill it, migrate consumers, then drop the old column — over weeks, not in one deploy.

CDC vs Alternatives

Approach	How it works	Pros	Cons
Application-level events	App publishes to Kafka explicitly after DB write	Full control over event shape; no DB dependency	Dual-write problem; events can diverge from DB state
Polling	Consumer queries DB on a timer: `WHERE updated_at > last_check`	Simple; no log access needed	Latency = poll interval; misses deletes; `updated_at` must exist on every table; load on DB
Database triggers	DB trigger fires on INSERT/UPDATE/DELETE and writes to an event table	Guaranteed to capture every change	Triggers add write latency; complex to manage; DB-specific; hard to scale
CDC (log-based)	Read the database’s transaction log directly	No application changes; captures all changes including manual SQL; no write overhead; exactly the committed changes	Requires DB log access; schema-coupled events; operational complexity (Debezium, Kafka Connect)

Production Considerations

Replication Slot Management (PostgreSQL)

PostgreSQL replication slots track how far a consumer has read the WAL. If the consumer stops (Debezium crashes, network partition), the slot prevents PostgreSQL from removing unread WAL segments. If Debezium is down for too long, WAL segments accumulate and fill the disk.

-- Monitor replication slot lag
SELECT slot_name,
       pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)
       AS lag_bytes
FROM pg_replication_slots;

-- Alert if lag exceeds threshold (e.g., 1 GB)
-- If Debezium is permanently stuck, drop the slot to prevent disk full:
-- SELECT pg_drop_replication_slot('debezium_orders');
-- (This means Debezium must re-snapshot when it restarts)

Monitoring

Metric	What to watch	Alert threshold
Replication slot lag	Bytes of unread WAL	> 500 MB → Debezium is falling behind
Event latency	Time from DB commit to Kafka publish	> 10 seconds → CDC pipeline stalled
Connector status	Debezium connector state (RUNNING/FAILED)	Any FAILED state → immediate alert
Kafka consumer lag	Downstream consumers falling behind CDC topic	Growing lag → consumer can’t keep up
Disk usage on DB server	WAL accumulation from stalled replication slot	> 80% → risk of disk full

Handling Debezium Failures

Scenario: Debezium crashes at WAL position 1000

  1. Debezium records its last committed offset in Kafka Connect
  2. On restart, Debezium resumes from offset 1000
  3. Events 1000+ are re-read from the WAL and re-published to Kafka
  4. Downstream consumers receive duplicates for events 1000+
  5. Consumers must be IDEMPOTENT (deduplicate by event ID or source LSN)

  This is at-least-once delivery — same as the outbox pattern.
  Exactly-once requires idempotent consumers.

End-to-End Architecture

flowchart TB
    subgraph "Application Layer"
        OS[Order Service]
        US[User Service]
        PS[Product Service]
    end

    subgraph "Source Databases"
        DB1[(Orders DB
PostgreSQL)]
        DB2[(Users DB
PostgreSQL)]
        DB3[(Products DB
MongoDB)]
    end

    OS --> DB1
    US --> DB2
    PS --> DB3

    subgraph "CDC Layer"
        D1[Debezium
PostgreSQL Connector]
        D2[Debezium
PostgreSQL Connector]
        D3[Debezium
MongoDB Connector]
    end

    DB1 --> D1
    DB2 --> D2
    DB3 --> D3

    subgraph "Kafka"
        T1[orders topic]
        T2[users topic]
        T3[products topic]
    end

    D1 --> T1
    D2 --> T2
    D3 --> T3

    subgraph "Downstream Consumers"
        Search[Search Service
→ Elasticsearch]
        Cache[Cache Sync
→ Redis]
        Analytics[S3 Sink
→ Data Lake]
        Notify[Notification
Service]
    end

    T1 & T2 & T3 --> Search
    T1 --> Cache
    T1 & T2 & T3 --> Analytics
    T1 --> Notify

    subgraph "Schema Management"
        SR[Schema Registry
compatibility enforcement]
    end

    D1 & D2 & D3 --> SR
    Search & Cache & Analytics & Notify --> SR

ℹ️

Interview tip: When discussing how to keep multiple systems in sync (cache, search index, data warehouse), say: “I’d use Change Data Capture with Debezium reading the PostgreSQL WAL. Every committed row change — INSERT, UPDATE, DELETE — is captured from the transaction log and published to a Kafka topic. This avoids the dual-write problem because the application only writes to the database; CDC extracts events from the DB’s own replication log, which guarantees we see exactly the committed changes. Downstream consumers (Elasticsearch for search, Redis for cache, S3 for the data lake) each read from the Kafka topic independently. For the outbox pattern, the app writes business data and an outbox event in one transaction; Debezium reads the outbox table and publishes to Kafka. Consumers must be idempotent because CDC provides at-least-once delivery — if Debezium restarts, it re-reads from the last committed WAL position and may republish events. Schema evolution is handled via Schema Registry with backward-compatible changes only.”

Test Your Understanding

Your colleague argues: ‘Why use CDC when we can just have the application publish events to Kafka after each DB write? It’s simpler and we control the event schema.’ What’s the strongest counter-argument?

The dual-write problem is not hypothetical — it’s guaranteed to bite you. The application writes to the DB, then publishes to Kafka. Three failure modes:

DB write succeeds, Kafka publish fails → data in DB, not in downstream systems.
Kafka publish succeeds, DB transaction rolls back → phantom event for non-existent data.
Both succeed but app crashes before confirming → retry publishes a duplicate.

CDC eliminates all three because it reads from the DB’s own transaction log — it sees exactly the committed changes. The DB’s WAL is the single source of truth; CDC is just tailing it.

The event schema argument is valid: CDC events are schema-coupled to the DB table. If you need a different event shape, use CDC to read the outbox table (where you control the event schema) rather than reading application tables directly.

Debezium is streaming CDC events from PostgreSQL to Kafka. The Debezium connector crashes and stays down for 6 hours. During that time, PostgreSQL accumulates WAL segments that it can’t reclaim because the replication slot holds them. Your DB disk fills up and the production database stops accepting writes. How do you prevent this?

Set max_slot_wal_keep_size in PostgreSQL (v13+). This limits how much WAL a replication slot can retain. When the limit is exceeded, PostgreSQL reclaims the oldest WAL segments even if the slot consumer hasn’t read them.

The trade-off: If Debezium comes back after the WAL was reclaimed, it can’t resume from its last position. It must perform a full re-snapshot of the tables — which is expensive but recoverable. The alternative (disk full, database down) is catastrophic.

Production checklist:

Alert on replication slot lag (pg_wal_lsn_diff) at 500MB and 1GB thresholds.
Set max_slot_wal_keep_size = 10GB as a safety valve.
Monitor Debezium connector health; auto-restart on failure.
If Debezium is permanently decommissioned, always drop the replication slot — orphaned slots are the #1 cause of WAL disk-full incidents.

You use the outbox pattern: the app writes business data + outbox event in one transaction. Debezium reads the outbox table. But your app inserts 10,000 orders/second, each generating an outbox event. The outbox table grows unboundedly. How do you manage it?

Delete outbox rows after Debezium has consumed them. Two approaches:

Debezium Outbox Event Router with table.op.invalid.behavior=warn. Configure Debezium to capture only INSERT events from the outbox table. After CDC captures the INSERT, a separate cleanup job deletes rows older than N minutes. The DELETE events are ignored by the outbox router.
Truncate on a schedule. Run DELETE FROM outbox_events WHERE created_at < now() - interval '1 hour' every 10 minutes. Since Debezium reads from the WAL (not the table), deleting rows doesn’t affect CDC — the INSERT events are already in the WAL.

Key insight: The outbox table is a transient staging area, not a permanent store. Kafka is the durable event log. The outbox table should stay small (< 100K rows) at all times.

Data Warehouse Basics