Two-Phase Commit (2PC)

Two-Phase Commit is a protocol that ensures all participants in a distributed transaction either all commit or all abort. It solves the atomic commit problem: when a transaction spans multiple nodes, how do you guarantee that no node commits while another aborts?

The Atomic Commit Problem

A single database handles atomicity with a local transaction log — write the commit record, then it’s committed. But when a transaction touches data on multiple nodes (e.g., transferring money between two banks), a local commit on one node says nothing about the other.

The challenge: any participant can fail at any time. If Node A commits and Node B crashes before committing, the system is in an inconsistent state — the money left one account but never arrived at the other.

2PC solves this by splitting the commit into two phases, with a coordinator orchestrating the decision.

The Protocol

Phase 1 — Prepare

sequenceDiagram
    participant C as Coordinator
    participant A as Participant A (Bank 1)
    participant B as Participant B (Bank 2)

    C->>C: Write BEGIN to transaction log

    par Prepare in parallel
        C->>A: PREPARE (txn=T1, debit $100 from account X)
        C->>B: PREPARE (txn=T1, credit $100 to account Y)
    end

    Note over A: Acquire row lock on account X
Write undo log (balance=500)
Write redo log (balance=400)
fsync to disk Note over B: Acquire row lock on account Y
Write undo log (balance=200)
Write redo log (balance=300)
fsync to disk A->>C: YES — I promise I can commit B->>C: YES — I promise I can commit

A YES vote is an irrevocable promise: the participant guarantees it can commit the transaction at any point in the future, even after a crash and recovery. This is why it must fsync its redo log before voting — the data must survive a reboot.

A NO vote means the participant cannot commit (constraint violation, deadlock, disk failure). The coordinator will abort the entire transaction.

Phase 2 — Commit or Abort

sequenceDiagram
    participant C as Coordinator
    participant A as Participant A
    participant B as Participant B

    Note over C: All voted YES

    C->>C: Write COMMIT to transaction log (fsync) — the commit point

    par Commit in parallel
        C->>A: COMMIT (txn=T1)
        C->>B: COMMIT (txn=T1)
    end

    Note over A: Apply redo log
Release row lock Note over B: Apply redo log
Release row lock A->>C: ACK B->>C: ACK C->>C: Write END to transaction log
ℹ️

The commit point is the moment the coordinator writes COMMIT to its durable log — not when participants acknowledge. Once that record is fsync’d, the transaction is committed regardless of subsequent crashes.

If any participant voted NO:

sequenceDiagram
    participant C as Coordinator
    participant A as Participant A
    participant B as Participant B

    A->>C: YES
    B->>C: NO (constraint violation)

    C->>C: Write ABORT to transaction log

    par Abort in parallel
        C->>A: ABORT (txn=T1)
        C->>B: ABORT (txn=T1)
    end

    Note over A: Apply undo log
Release row lock Note over B: Release lock (never modified data)

Failure Modes

This is where 2PC gets interesting — and where it earns the label “blocking protocol.”

Participant Crash Before Voting

Coordinator sends PREPARE to A and B
A crashes before responding

Coordinator waits for timeout → no response from A
Coordinator writes ABORT → sends ABORT to B
B rolls back, releases locks

A recovers: checks with coordinator → "transaction was aborted" → rolls back

Impact: The coordinator simply aborts. No blocking.

Participant Crash After Voting YES

A votes YES, then crashes immediately

Coordinator received YES from both A and B
Coordinator writes COMMIT, sends COMMIT to both
B commits successfully, A is down

A recovers: reads its redo log → "I voted YES for T1"
A asks coordinator: "What happened to T1?"
Coordinator: "T1 was committed"
A applies redo log → commits

Impact: A is temporarily unavailable but recovers correctly. Its YES vote guaranteed it could commit later.

Coordinator Crash — The Blocking Problem

This is the critical failure mode that defines 2PC’s weakness:

sequenceDiagram
    participant C as Coordinator
    participant A as Participant A
    participant B as Participant B

    par Prepare
        C->>A: PREPARE
        C->>B: PREPARE
    end

    A->>C: YES (locks held, redo log written)
    B->>C: YES (locks held, redo log written)

    Note over C: 💥 Coordinator crashes BEFORE
writing COMMIT or ABORT Note over A: Voted YES — cannot abort unilaterally
Cannot commit — don't know if B voted YES
BLOCKED: holding locks, waiting for coordinator Note over B: Same situation — BLOCKED Note over A,B: Both participants stuck holding locks
until coordinator recovers and replays decision

Why this is devastating:

  • Participants hold row-level locks during the entire wait
  • Other transactions that need those rows are blocked too
  • If the coordinator’s disk is destroyed, participants may be stuck indefinitely
  • No participant can safely make a unilateral decision — committing when the coordinator intended to abort (or vice versa) would violate atomicity

Coordinator Recovery

The coordinator’s durable transaction log enables recovery:

Log StateRecovery Action
No record of T1Abort (never started Phase 2)
BEGIN but no decisionAbort (PREPARE may have been sent, but no commit decided)
COMMIT writtenRe-send COMMIT to all participants
ABORT writtenRe-send ABORT to all participants
END writtenNothing to do — transaction fully completed

Participants that time out waiting for a decision can ask the coordinator (or, in cooperative recovery protocols, ask other participants) for the outcome.

Performance Cost

Timeline of a 2PC transaction:

Client ──── BEGIN TXN ──── PREPARE ──── COMMIT ──── END
                              │            │
              Locks acquired ─┘            └─ Locks released

Lock hold time = network RTT (prepare) + coordinator fsync
               + network RTT (commit) + participant fsync
             ≈ 4-20ms locally, 100-400ms cross-datacenter
CostDetail
Network round trips2 (prepare + commit), 4 messages minimum
Forced disk writesCoordinator: 2 (commit record, end record). Each participant: 1 (redo log before YES vote)
Lock durationEntire protocol — not just the local operation
LatencyBounded by the slowest participant

Cross-datacenter 2PC is particularly painful: a 100ms RTT between datacenters means locks are held for 200-400ms, dramatically reducing throughput.

XA Transactions

XA is the industry standard API for 2PC across heterogeneous resources (e.g., PostgreSQL + Oracle + a message queue in the same transaction).

Application (Transaction Manager)
    ├── xa_start() → PostgreSQL (Resource Manager 1)
    ├── xa_start() → Oracle (Resource Manager 2)
    │
    │   ... execute SQL on both databases ...
    │
    ├── xa_prepare() → PostgreSQL → YES
    ├── xa_prepare() → Oracle → YES
    ├── xa_commit() → PostgreSQL → OK
    └── xa_commit() → Oracle → OK

Why XA is rare in practice:

  • Each resource manager holds an open connection and locks during the entire protocol
  • The transaction manager (coordinator) is a single point of failure
  • Performance is 2-10x worse than local transactions
  • Most modern applications use microservices with separate databases — XA doesn’t work across independent services

When to Use 2PC vs Alternatives

ScenarioUseWhy
Cross-shard commit within one database (Spanner, CockroachDB)2PCControlled environment, low latency between shards, database manages coordinator
Cross-service business transaction (Order → Payment → Inventory)SagaServices are independent, locks across services are impractical
Database + message queue atomicityOutbox patternAvoids distributed transaction entirely
Read-only cross-node queryNeitherNo atomicity needed for reads
⚠️

Interview red flag: If a candidate proposes 2PC across microservices (“we’ll use a distributed transaction to ensure the order and payment are atomic”), that signals a misunderstanding of microservice architecture. 2PC requires tight coupling and shared failure domains — the opposite of why you chose microservices. The correct answer is the Saga pattern with compensating transactions.

Three-Phase Commit (3PC)

3PC adds a pre-commit phase between prepare and commit to eliminate the blocking problem:

Phase 1: PREPARE → YES/NO (same as 2PC)
Phase 2: PRE-COMMIT → ACK (participants know commit is coming)
Phase 3: COMMIT → ACK

If the coordinator crashes after pre-commit, participants know the decision was to commit (they received pre-commit) and can proceed without the coordinator. If they never received pre-commit, they can safely abort.

Why 3PC is not used in practice:

  • Requires a synchronous network (bounded message delay) — unrealistic in real networks
  • Network partitions can still cause inconsistency (one side commits, the other aborts)
  • The extra round trip adds latency without solving the real-world problem
  • Paxos/Raft-based commit protocols (e.g., Spanner) solve the problem properly