CAP Theorem
A fiber cut between us-east-1a and us-east-1b just isolated half your database cluster from the other half. Both halves are alive; both can serve traffic; neither can talk to the other. Right now your inventory service is being asked: “Is this product in stock?” — and both sides have to answer something. Do you return potentially stale stock counts (and risk overselling) or refuse the request entirely (and lose the sale)? There is no third option — and that is the heart of the CAP theorem. It forces every distributed system to declare, in advance, what it does when the network breaks.
The CAP theorem states that a distributed system can satisfy at most two of three properties simultaneously: Consistency, Availability, and Partition Tolerance. During a network partition, you must choose between consistency and availability — you cannot have both.
The Three Properties
Consistency (C)
Every read returns the most recent write or an error. All nodes see the same data at the same time.
This is linearizability in CAP terms — the system behaves as if there is one copy of the data, and every operation takes effect atomically at some point between its start and completion. It is not the same as eventual consistency.
CAP “consistency” ≠ ACID “consistency”. CAP’s C means linearizability — every read sees the most recent write across all nodes. ACID’s C means the database transitions between valid states (constraints satisfied). A system can be ACID-consistent (all constraints hold) while being CAP-inconsistent (replicas return stale data). Confusing the two in an interview is a common red flag.
Node A ──── write x=2 ────► Node A: x=2
Node B: x=2 ← must reflect the write immediately
Node C: x=2
Any read from any node must return x=2 after the write commits.
A read returning x=1 (old value) is a consistency violation.Availability (A)
Every request to a non-failing node receives a response — not an error, not a timeout. The response does not need to contain the most recent data; it just cannot be an error.
Key nuance: A system is available in the CAP sense even if it returns stale data, as long as it responds. A system that returns “503 Service Unavailable” or times out during a partition is not available.
Partition Tolerance (P)
The system continues operating even when the network drops or delays messages between nodes. Nodes cannot communicate with each other during a partition, but the system keeps running.
Partition tolerance is not optional. Networks fail. Links go down. Switches lose packets. In any real distributed system deployed across multiple machines or datacenters, you must assume partitions will occur. P is not a choice — it is a requirement of being distributed.
This means the real choice is: CP or AP.
Why You Can’t Have All Three During a Partition
When a partition occurs, two nodes cannot communicate. A write has been applied to Node A but not yet to Node B.
sequenceDiagram
participant C1 as Client 1
participant A as Node A
participant B as Node B
participant C2 as Client 2
Note over A,B: ✗ Network partition — A and B cannot communicate
C1->>A: write x = 2
A->>A: x = 2 (stored locally)
A->>C1: write ACK
Note over B: B still has x = 1 (old value)
C2->>B: read x
alt CP — choose Consistency
B->>C2: ERROR (cannot confirm latest value — sacrifices Availability)
else AP — choose Availability
B->>C2: x = 1 (stale — sacrifices Consistency)
endThe impossibility is simple: Node B cannot return x = 2 because the partition prevents it from receiving the write. It has two choices:
- Refuse the read (CP) — returns an error or blocks until the partition heals. The system remains consistent but gives up availability.
- Return stale data (AP) — responds immediately with
x = 1. The system remains available but serves an inconsistent response.
There is no third option. You cannot conjure the value x = 2 on Node B without receiving it from Node A.
CP Systems
CP systems choose consistency over availability during a partition. The mechanism is always the same: a write must reach a majority (quorum) before it commits. A node that cannot reach a quorum refuses requests rather than risk serving stale data, until the partition heals.
5-node cluster, partition splits into [3, 2]:
Majority side (3 nodes): can form quorum → serves reads and writes
Minority side (2 nodes): cannot form quorum → refuses all requests
After the partition heals, all nodes converge on the same state.You’ll see this in coordination and metadata stores, where handing out stale data (a wrong leader address, a deleted config) is worse than briefly refusing requests:
| System | Quorum mechanism | Typical role |
|---|---|---|
| ZooKeeper | Zab atomic broadcast | Leader election, config, locks |
| etcd | Raft | Kubernetes control-plane state |
| HBase | Coordinated via ZooKeeper | Fences off partitioned RegionServers |
AP Systems
AP systems choose availability over consistency during a partition. Every node keeps accepting reads and writes, so partitions may diverge; after the partition heals, the system reconciles the conflicting versions. Two ideas matter more than any product:
- Tunable consistency — the CP/AP label isn’t fixed. A quorum-based store behaves as AP when it answers from a single replica (fast, maybe stale) and as CP when it requires a majority (refuse rather than serve stale). The same cluster sits on different sides of CAP depending on the per-operation setting.
- Reconciliation on reconnect — because both sides accepted writes, the system must merge them: last-write-wins by timestamp, version/vector clocks to detect concurrent writes, or CRDTs where any merge order is correct. Availability during the partition is paid for with reconciliation complexity after it.
| System | Default | How it stays available |
|---|---|---|
| Cassandra | AP at ONE, CP-leaning at QUORUM | Tunable consistency level per operation |
| DynamoDB | AP (eventually consistent reads) | Strongly consistent reads are opt-in (CP-like) |
| CouchDB | AP | Multi-master, merge-on-reconnect via revision trees |
System Classification
| System | Default behavior | Reason |
|---|---|---|
| ZooKeeper | CP | Zab requires quorum for all writes; minority partition rejects requests |
| etcd | CP | Raft quorum required; minority partition is read-only or unavailable |
| HBase | CP | Coordinated by ZooKeeper; consistency over availability |
| Cassandra | AP (default) | Tunable; ONE = AP, QUORUM = closer to CP |
| DynamoDB | AP (default) | Eventually consistent reads by default; strong reads = CP-like |
| CouchDB | AP | Multi-master with merge-on-reconnect; always accepts writes |
| MongoDB | CP (default) | Primary-only writes via Raft; secondary reads are optionally AP |
| Spanner | CP | TrueTime-based external consistency; minority partition unavailable |
Common Misconceptions
“You pick 2 of 3.” The framing implies P is optional — it isn’t. Any system spanning multiple machines must tolerate partitions, so the real choice is CP or AP when one occurs. With no partition, a system can satisfy all three at once.
“CA systems exist.” CA only describes a single node, which has no partition to tolerate. “MySQL is CA” means single-node MySQL; the moment you add a replica, the primary–replica link can partition and you must choose CP or AP.
“CAP-C = ACID-C.” CAP’s C is linearizability (every read reflects the latest write globally); ACID’s C is invariant preservation (constraints hold within a transaction). A system can be ACID-consistent on one node while returning stale cross-node reads during a partition.
“Cassandra is always AP.” At ALL (or QUORUM when a partition drops below quorum) it refuses the operation rather than serve stale data — CP behavior. The label depends on the configured consistency level, not the product.
CAP in Practice
The binary CP/AP framing is a simplification. Real systems are more nuanced:
| Approach | How it works |
|---|---|
| Tunable consistency | Cassandra, DynamoDB let you choose per-operation — pay higher latency for stronger consistency on critical reads |
| Read-your-writes guarantee | Route reads that must be fresh to the primary (CP path); route stale-ok reads to replicas (AP path) |
| Optimistic concurrency | AP system accepts all writes; detect conflicts on read using version vectors; resolve in application |
| Fencing tokens | Accept writes on both sides of partition; use monotonic tokens to discard stale writers after reconnection |
The PACELC model extends CAP to describe the latency vs. consistency tradeoff that exists even in the absence of partitions — which is the more common operating condition.
Interview tip: I always lead with: “P is not optional — networks partition, so the real question is CP or AP per data type.” Then I separate concerns: inventory, balances, and idempotency keys are CP (refuse the request rather than oversell or double-charge); feeds, view counts, and search indexes are AP (stale is fine, downtime isn’t). I’m careful never to conflate CAP’s C — which is linearizability, a single-object cross-replica freshness guarantee — with ACID’s C, which is invariant preservation within a single node. And I follow up with PACELC because partitions are rare; the real cost most days is the latency-vs-consistency tradeoff on every read.
Test Your Understanding
An AP database allows local writes on both sides of a network split to stay available. The partition heals — and both sides accepted conflicting writes to the same key. What does CAP actually promise you here, and who cleans up the mess?
CAP promises nothing about reconciliation. Choosing A during the partition only guarantees both sides kept responding — it says nothing about how the divergent values get merged afterward. That is entirely your problem to solve at the application or storage layer.
The conflict-resolution strategy is what you actually have to design: last-write-wins (highest timestamp wins — but clock skew can silently discard a newer write), vector clocks (detect which writes were truly concurrent), CRDTs (data structures where any merge order is correct — counters, add-only sets), or multi-version (keep both, let the app/user resolve, as CouchDB does). The lesson: availability during a partition is not free — you pay for it later with reconciliation complexity.
A colleague says ‘we run MySQL, so we’re CA — consistent and available.’ Why is that statement meaningless the moment you add a single read replica?
Because CA only describes a single-node system, which has no partition to tolerate. A lone MySQL instance never faces a network split between nodes, so the CP/AP question is moot. The instant you add a replica, you have a distributed system, and the link between primary and replica can partition.
Now you must choose: does the replica keep serving reads when it loses contact with the primary (AP — risk stale data), or does it refuse reads to avoid serving staleness (CP — risk unavailability)? “CA” was never a deployment choice you could make for a distributed system — P is mandatory, so the real options are CP or AP.
Your inventory service is CP, so during a partition the minority side refuses writes. A stakeholder asks: ‘Why not let both sides keep selling and reconcile the counts later?’ What’s the failure mode you’re protecting against?
Overselling — and you can’t un-sell. With both sides accepting orders for the last unit in stock, each side sees stock available (neither can see the other’s decrement during the partition), so both sell it. After the partition heals you discover you sold the same physical item twice. Reconciliation can detect the conflict but cannot reverse a shipped order or an honored promise to a customer.
This is exactly why inventory, balances, and seat reservations are CP: refusing the request (a lost sale, recoverable) is strictly better than serving a stale “in stock” answer (an oversell, not recoverable). The CP choice trades availability for correctness on purpose.
Cassandra is widely called an ‘AP’ system. Under what configuration does it behave like a CP system instead — and what does that tell you about the CP/AP label?
At consistency level ALL (or QUORUM when a partition drops you below quorum), Cassandra refuses the operation rather than serve a potentially stale result — that’s CP behavior. With ALL, every replica must acknowledge; if one is unreachable due to a partition, the read or write fails.
The takeaway: CP vs AP is not a fixed property of a database — it’s a per-operation decision for systems with tunable consistency. The same Cassandra cluster is AP at ONE (answer fast, maybe stale) and CP-leaning at QUORUM/ALL (refuse rather than risk staleness). Classifying a whole system as “AP” is a simplification; the honest answer is “AP at these consistency levels, CP at those.”