Multi-Region Design & Geo-Replication

Multi-Region Design & Geo-Replication

Multi-region architecture deploys an application across geographically separated datacenters to reduce latency, survive regional outages, and comply with data residency laws. The core tension is between consistency (all regions see the same data) and latency (cross-region round trips cost 50–200ms).

Why Go Multi-Region?

DriverDetail
LatencyUsers in Tokyo hitting a US-East server see ~150ms network RTT. A local region cuts this to ~5ms.
AvailabilityA single-region system has an availability ceiling of ~99.99% (one region’s SLA). Multi-region with failover reaches 99.999%.
ComplianceGDPR requires EU user data processed in the EU. Brazil’s LGPD, India’s DPDP Act — all mandate data residency.
Disaster recoveryFire, power outage, fiber cut, cloud provider outage — a second region keeps the business running.

Active-Passive

One region handles all traffic. The other region is a warm standby that receives replicated data but serves no user requests.

sequenceDiagram
    participant U as Users (Global)
    participant DNS as DNS / Load Balancer
    participant P as Primary (US-East)
    participant S as Standby (EU-West)

    U->>DNS: api.example.com
    DNS->>U: 52.1.2.3 (US-East IP)

    U->>P: All reads and writes
    P->>S: Async replication (50-200ms lag)

    Note over P: 💥 US-East region failure

    DNS->>DNS: Health check fails → update DNS
    DNS->>U: 34.5.6.7 (EU-West IP)

    U->>S: All traffic now hits standby
    Note over S: Promoted to primary
Accepts reads and writes
MetricTypical Value
RTO (Recovery Time Objective)1–30 minutes (DNS TTL + promotion time)
RPO (Recovery Point Objective)Seconds to minutes (async replication lag)
Data loss on failoverUp to the replication lag at the time of failure
CostStandby resources sit idle (30–50% of primary cost)

Pros: Simple — no conflict resolution, one source of truth. Cons: Standby is wasted capacity. Global users have high latency (all traffic goes to one region). Failover is manual or semi-automated.

Active-Active

Multiple regions serve traffic simultaneously. Each region has its own database (or a partition of the global database). Users are routed to their nearest region.

sequenceDiagram
    participant EU as EU User
    participant US as US User
    participant GR as Geo-Router (DNS/LB)
    participant R1 as EU-West Region
    participant R2 as US-East Region

    EU->>GR: api.example.com
    GR->>EU: 34.5.6.7 (EU-West)
    EU->>R1: Read/Write (local, ~5ms)

    US->>GR: api.example.com
    GR->>US: 52.1.2.3 (US-East)
    US->>R2: Read/Write (local, ~5ms)

    R1->>R2: Async cross-region replication
    R2->>R1: Async cross-region replication

    Note over R1,R2: Both regions accept writes independently
Conflicts possible when same data modified in both
MetricTypical Value
RTONear-zero (traffic re-routes automatically)
RPOZero for local writes; replication lag for cross-region
LatencyLocal region RTT (~5ms)
CostBoth regions active — full capacity in each

Pros: Low latency globally. Near-zero downtime on region failure. Cons: Conflict resolution is hard. More expensive (no idle standby — both are fully provisioned). Operational complexity.

Geo-Routing

How do users reach their nearest region?

GeoDNS

The DNS server returns different IP addresses based on the client’s geographic location (inferred from the resolver’s IP or EDNS Client Subnet).

User in Paris → DNS returns EU-West IP (34.5.6.7)
User in NYC   → DNS returns US-East IP (52.1.2.3)
User in Tokyo → DNS returns AP-NE IP   (13.8.9.10)

Limitations: DNS TTL means failover is slow (minutes). Geographic inference is approximate (VPN users, corporate resolvers). No real-time latency awareness.

Anycast

Multiple servers share the same IP address. BGP routing directs packets to the nearest server based on network topology.

All regions advertise: 198.51.100.1
User in Paris → BGP routes to EU-West (shortest AS path)
User in NYC   → BGP routes to US-East

Used by: Cloudflare, Google, all major CDNs. Best for stateless protocols (DNS, CDN). TCP Anycast requires careful handling of connection affinity.

Latency-Based Routing

The load balancer actively measures latency to each region and routes users to the region with the lowest measured latency. More accurate than GeoDNS but requires health-check infrastructure.

AWS Route 53: Supports latency-based routing natively. Measures RTT from each region to the user’s DNS resolver and returns the lowest-latency endpoint.

Cross-Region Replication Strategies

The fundamental trade-off: synchronous replication across regions adds 50–200ms of latency to every write, while asynchronous replication creates a window where regions can diverge.

flowchart TD
    A[Write arrives at local region] --> B{Replication strategy?}
    B -->|Synchronous| C[Wait for remote region ACK
+100-200ms per write] C --> D[Strong consistency
Zero data loss] B -->|Asynchronous| E[ACK immediately
Replicate in background] E --> F[Low latency
Replication lag = data loss risk] B -->|Semi-synchronous| G[ACK after local +
one remote region] G --> H[Balanced: some latency,
some lag tolerance]
StrategyWrite latencyConsistencyData loss on failure
Synchronous+100–200ms (cross-region RTT)Strong (linearizable)Zero
AsynchronousLocal only (~5ms)EventualUp to replication lag
Semi-synchronous+100–200ms (one remote ACK)Strong within quorumZero if quorum survives

Google Spanner uses synchronous replication with Paxos across regions. It accepts the latency cost (~10–50ms with TrueTime, higher for distant regions) in exchange for global strong consistency. Most systems cannot afford this trade-off and use asynchronous replication.

Conflict Resolution in Active-Active

When two regions independently modify the same data before replication catches up, you have a write conflict. This is the hardest problem in multi-region design.

Last-Write-Wins (LWW)

Use a timestamp to pick the “latest” write. The other write is silently discarded.

EU-West:  UPDATE users SET name='Alice' WHERE id=42   (ts=1000)
US-East:  UPDATE users SET name='Alicia' WHERE id=42  (ts=1001)

Conflict resolution: ts=1001 > ts=1000 → keep 'Alicia', discard 'Alice'

Pros: Simple, deterministic, no user intervention. Cons: Data loss. The “losing” write is silently dropped. Clock skew can cause the “earlier” write to win. Acceptable for low-value data (last-seen timestamps, analytics counters), dangerous for high-value data (account balances, orders).

CRDTs (Conflict-Free Replicated Data Types)

Data structures that are mathematically guaranteed to converge when merged, regardless of the order updates are applied.

CRDTWhat it doesMerge operationExample
G-CounterGrow-only counterSum of per-node countsPage view counter: EU adds 50, US adds 30 → total 80
PN-CounterCounter with increment/decrementG-Counter pair (positive + negative)Like count: +100 likes, -3 unlikes → 97
G-SetGrow-only setUnionTag set: EU adds “urgent”, US adds “billing” → {“urgent”, “billing”}
OR-SetSet with add and removeAdd wins over concurrent removeShopping cart: add item A, concurrent remove of item A → keep A
LWW-RegisterSingle value, last-write-winsHigher timestamp winsUser profile name field

Used by: Redis CRDT (enterprise), Riak (data types), Cassandra counters (G-Counter internally), Figma (real-time collaboration).

Limitation: CRDTs work for specific data structures. You can’t model arbitrary business logic as a CRDT. An “account balance” is not a simple counter — it has constraints (no negative balance) that CRDTs cannot enforce.

Application-Level Merge

The application defines custom merge logic for conflicting writes.

EU-West:  shopping_cart = {items: [A, B]}
US-East:  shopping_cart = {items: [A, C]}

Application merge: union of items → {items: [A, B, C]}

Pros: Handles complex business logic (merge shopping carts, merge document edits). Cons: Must be implemented per data type. Bugs in merge logic cause data corruption.

Conflict Flagging

Detect the conflict, store both versions, and present them to the user (or an automated resolver) for resolution.

EU-West:  address = "123 Rue de Paris"
US-East:  address = "456 Fifth Ave"

Both stored with version vectors. On next read:
  "Conflict detected — which address is correct?"
  User picks one → conflict resolved

Used by: CouchDB, Riak (siblings), Git (merge conflicts).

Read-Your-Writes in Multi-Region

After a user writes to their local region, an immediate read might hit a replica that hasn’t received the write yet (asynchronous replication lag).

sequenceDiagram
    participant U as User (EU)
    participant EU as EU-West (Primary for this user)
    participant US as US-East (Replica)

    U->>EU: Update profile picture ✓
    EU->>US: Async replication (in progress...)

    Note over U: User refreshes page
Request hits EU (local) → sees new picture ✓ Note over U: User opens mobile app
CDN/LB routes to US-East U->>US: GET profile US->>U: Returns OLD picture (replication not complete) Note over U: ❌ "Where did my picture go?"

Solutions

StrategyHowTrade-off
Sticky routingRoute all requests from a user to their write region (cookie, user-id hash)Uneven load; defeats the purpose of multi-region for that user
Read-from-primaryAfter a write, read from the primary for N secondsHigher latency for post-write reads
Causal tokensWrite returns a token (LSN/timestamp); read includes it; replica blocks until it reaches that pointSlight read latency; requires coordination
Write-region affinityEach user is “homed” to a region; all their writes go there; reads can go anywhere with causal tokensRequires user-to-region mapping

Data Sovereignty and Compliance

Multi-region design intersects with legal requirements:

GDPR (EU):         EU user data must be processed in the EU
LGPD (Brazil):     Brazilian user data requires consent and local processing
DPDP (India):      Certain categories of data must stay within India
HIPAA (US):        Health data requires specific controls regardless of location

Geo-Partitioning

Partition data by geography so each user’s data lives only in their legal jurisdiction:

User in Germany → data stored in EU-West only
User in US      → data stored in US-East only
User in Brazil  → data stored in SA-East only

Cross-region queries: route to the correct region based on user ID
Global analytics:     query each region separately, aggregate centrally

CockroachDB supports geo-partitioning natively: table rows can be pinned to specific regions based on a column value (e.g., country). Reads and writes for that row are served by the region that owns it.

Architecture Patterns Summary

flowchart TD
    A[Multi-Region Requirement] --> B{Primary driver?}
    B -->|Disaster Recovery only| C[Active-Passive]
    C --> C1[Async replication to standby
DNS failover on outage] B -->|Low latency globally| D{Can tolerate conflicts?} D -->|Yes — counters, logs, feeds| E[Active-Active + CRDTs / LWW] D -->|No — financial, orders| F{Latency budget?} F -->|Can afford +100ms writes| G[Active-Active + Synchronous Replication
e.g., Spanner, CockroachDB] F -->|Cannot afford| H[Active-Active + Geo-Partitioned Writes] H --> H1[Each user's writes go to one region
No cross-region write conflicts
Cross-region reads with causal tokens] B -->|Data residency / compliance| I[Geo-Partitioned
Data pinned to jurisdiction]
ℹ️

Interview framing: “For a globally distributed app, I’d use active-active with geo-partitioned writes — each user is homed to their nearest region, all their writes go there, so there are no write conflicts. Reads can go to any region using causal consistency tokens to ensure read-your-writes. For data that’s truly global (product catalog, configuration), I’d use CRDTs or accept eventual consistency with LWW. Cross-region replication is async to keep write latency low, with semi-synchronous for critical data.”