SQL vs NoSQL Decision Framework

SQL vs NoSQL Decision Framework

The single biggest mistake in system design interviews — and in real systems — is choosing a database before understanding the access patterns. The database is the consequence of the access pattern, not the starting point.

The framework: Define your access patterns → derive your consistency and scale requirements → select the database that satisfies them with the least operational complexity.

Step 1: Define Your Access Patterns

Before naming any database, answer these questions about your data:

QuestionWhy it matters
What is the primary read pattern? (point lookup by key, range scan, full-text, geo-proximity, aggregation, graph traversal)Different read patterns require fundamentally different index structures
What is the write pattern? (individual transactional writes, high-volume ingest, batch loads)Determines whether write amplification and transaction coordination cost is acceptable
How are entities related? (none, parent-child hierarchy, many-to-many graph)Many-to-many relationships across entities push toward SQL; graph-heavy relationships push toward graph DBs
Is the schema fixed or evolving? (stable columns, sparse/optional fields, user-defined attributes)Schema rigidity is a cost in SQL; schema-lessness is a cost in query flexibility
What is the consistency requirement? (strong consistency for every read, eventual consistency acceptable, tunable per operation)Strong consistency across distributed nodes is expensive — only pay for it where required
What is the scale target? (thousands of QPS, millions, hundreds of millions)Many systems never outgrow a single well-tuned PostgreSQL instance
What are the query patterns? (known key-based queries, ad-hoc analytical, complex joins, multi-field filters)Ad-hoc queries need SQL; known fixed queries can use NoSQL

SQL (Relational Databases)

SQL databases store data in tables with a fixed schema, enforce referential integrity via foreign keys, and support ACID transactions across multiple rows and tables in a single operation.

Core strengths:

StrengthWhy it matters
Multi-entity ACID transactionsTransfer $100 from account A to B atomically — impossible without coordinated transactions
Complex joins across entitiesCompute a report joining orders, customers, products, and shipping — one query
Ad-hoc queriesNo need to predict every query at schema design time; any column can be filtered, sorted, aggregated
Referential integrityForeign key constraints prevent orphan records — data consistency enforced at the database level
Rich indexingB-tree, partial, covering, expression, full-text — supports a wide range of query shapes
Mature toolingORMs, migration tools, query analyzers, replication, PITR — decades of ecosystem maturity

When SQL is the right choice:

  • Any system where correctness of writes matters more than write throughput (financial ledgers, order management, user accounts, inventory)
  • Systems with complex relationships between entities
  • Systems where query requirements are unpredictable or evolving (internal tooling, analytics on transactional data)
  • Moderate scale — PostgreSQL handles 10k–100k QPS with read replicas; MySQL at Meta serves billions of rows with custom sharding

Scale ceiling and how to push it:

Single primary (reads + writes)          → vertical scaling, connection pooling (PgBouncer)
Primary + read replicas                  → scale reads horizontally, writes still bottleneck at primary
Application-level sharding               → partition by user_id / tenant; each shard is an independent DB
Vitess (YouTube/PlanetScale)             → transparent sharding layer over MySQL
Citus (PostgreSQL extension)             → distributed PostgreSQL with coordinator + shards

Best engines: PostgreSQL (correctness, extensions, community), MySQL (operational maturity, Meta/GitHub scale), SQL Server (Windows/.NET ecosystems), Oracle (legacy enterprise, complex workloads).

NoSQL: Four Categories

“NoSQL” is not a single thing — it is four fundamentally different data models, each optimized for a different access pattern. Picking “NoSQL” without specifying which category is meaningless.

Key-Value Stores

Every value is opaque to the database — it stores and retrieves by key only. The application owns the structure of the value.

Access patternGET key, SET key value, DEL key — always by exact key
ConsistencySingle-key operations are atomic; no multi-key transactions
ScaleTrivially sharded by key hash; linear write and read scale
LatencySub-millisecond (Redis in-memory)
WeaknessNo secondary indexes; no range queries; no relationships; value is opaque
ExamplesRedis, DynamoDB (also supports document model), Memcached, etcd

Use for: session storage, caching, distributed locks, rate-limit counters, feature flags, leaderboards (Redis sorted sets), pub/sub.

Do not use for: data that needs to be queried by anything other than the primary key.

Document Stores

Each record is a self-contained document (JSON/BSON). Related data is embedded inside the document rather than normalized into separate tables. The database understands the document structure and can index into nested fields.

Access patternFetch a document by ID, query by any indexed field, including nested fields
ConsistencySingle-document operations are ACID (MongoDB 4.0+); multi-document transactions added but expensive
ScaleSharded by shard key; horizontal scale for reads and writes
StrengthSchema flexibility — fields can vary per document; natural fit for user profiles, product catalogs with variable attributes
WeaknessJoins between collections are expensive (done in application or via $lookup); schema-less means no enforcement
ExamplesMongoDB, Firestore, CouchDB, DynamoDB (document mode)

Use for: product catalogs (varying attributes per product type), user profiles, content management, mobile app backends, IoT device state.

Do not use for: data with many-to-many relationships that must be queried in both directions; financial transactions; data that requires ad-hoc multi-entity reporting.

The embedding trap: embedding all related data in one document (e.g., all comments inside a post document) works until the document grows unbounded. A post with 100,000 comments becomes a 10 MB document that must be fully loaded even when you need only the post title.

Wide-Column Stores

Data is stored in rows identified by a partition key. Each row can have thousands of columns, and different rows can have different columns. Optimized for high write throughput and time-series access patterns.

Access patternFetch by partition key (exact), with optional clustering column range scan
ConsistencyTunable per operation (ONE, QUORUM, ALL in Cassandra) — eventual by default
ScalePartitioned across nodes by partition key hash; near-linear horizontal write scale
StrengthExtremely high write throughput (LSM tree storage), predictable latency, no SPOF
WeaknessQuery pattern must be known at schema design time; no ad-hoc queries; no joins
ExamplesCassandra, HBase, Bigtable (Google), ScyllaDB

Use for: time-series data (IoT sensor readings, metrics, events), activity feeds, messaging (WhatsApp uses Cassandra for message storage), write-heavy logging.

Schema design is query-driven: In Cassandra, you design a table per query. Want to fetch messages by user AND by conversation? You need two tables — one partitioned by user_id, one by conversation_id. Denormalization is intentional and required.

Do not use for: transactional workloads; ad-hoc queries; small datasets (operational overhead not worth it below ~TB scale or millions of writes/sec).

Graph Databases

Stores entities (nodes) and their relationships (edges) as first-class citizens. Optimized for traversing relationships — “find all friends-of-friends within 3 hops who live in NYC” is a single query.

Access patternTraverse relationships from a starting node, path finding, pattern matching
ConsistencyTypically ACID (Neo4j)
ScaleHarder to shard — relationship traversal crosses partition boundaries
StrengthMulti-hop relationship queries that would require recursive CTEs or application-side loops in SQL
WeaknessPoor fit for non-graph queries; limited horizontal scale; smaller ecosystem
ExamplesNeo4j, Amazon Neptune, TigerGraph, Dgraph

Use for: social graphs (LinkedIn connections, Twitter follows), fraud detection (transaction graph anomalies), recommendation engines (collaborative filtering via graph traversal), knowledge graphs.

Reality check: Most systems that think they need a graph database actually have moderate relationship depth and are better served by PostgreSQL with recursive CTEs or adjacency list pattern. Reach for a graph DB when relationship traversal is the primary access pattern, not a secondary one.

Search Engines

Inverted indexes built for full-text search, relevance scoring, and faceted filtering. Not a general-purpose database — they are a read-optimized query layer, typically fed from another database.

Access patternFull-text search with relevance ranking, filtered aggregations, geo-proximity
ConsistencyNear-real-time (document visible ~1s after write); not strongly consistent
StrengthSub-second full-text search across billions of documents; rich aggregations and faceting
WeaknessNot a primary store — no ACID, limited write throughput, data must be re-indexed from source
ExamplesElasticsearch, OpenSearch, Solr, Typesense, Meilisearch

Use for: product search with filters and autocomplete, log analytics (ELK stack), document search, any query with LIKE '%term%' at scale (never efficient in SQL).

NewSQL: ACID at Horizontal Scale

NewSQL databases offer the relational model and ACID guarantees of traditional SQL databases with the horizontal scalability of NoSQL systems. They achieve this through distributed consensus (Raft/Paxos) and distributed transactions (2PC or timestamp-based ordering).

DatabaseUnderlying techCompatibilityNotes
Google SpannerTrueTime (GPS/atomic clock), PaxosProprietary SQLExternally consistent global transactions; used for Google Ads, Gmail
CockroachDBRaft consensus, MVCCPostgreSQL wire protocolSerializable transactions across nodes; geo-partitioning
TiDBRaft (TiKV storage), HTAPMySQL wire protocolSupports OLTP + OLAP on same cluster (TiFlash columnar engine)
YugabyteRaft, DocDB storagePostgreSQL + CassandraMulti-region active-active
PlanetScaleVitess + MySQLMySQL wire protocolHorizontal sharding with schema migrations without locks

When NewSQL is the right choice:

  • You need multi-entity ACID transactions but have outgrown a single PostgreSQL primary
  • You need multi-region active-active writes with strong consistency (Spanner, CockroachDB)
  • You are building a global system where data must be co-located with users for latency but consistent globally

The tradeoff: Distributed transactions require coordination across nodes — this adds latency (network round trips for consensus). A single PostgreSQL commit takes microseconds; a CockroachDB commit across 3 nodes takes 5–20ms. This is acceptable for transactional writes but not for latency-sensitive paths.

The Most Common Anti-Pattern

“We chose MongoDB/Cassandra/DynamoDB because it scales.”

This reasoning conflates two separate problems:

  1. Does our current system have a scale problem? (usually: no, not yet)
  2. Will a NoSQL database scale better when we do? (maybe, but at a cost)

What gets sacrificed when you choose NoSQL prematurely:

  • Joins become application-level logic — you now own what the database did for free
  • Ad-hoc queries become impossible — every new query shape may require a schema change or a new table
  • Transactions become distributed — coordinating writes across two NoSQL “documents” or “items” requires saga patterns, outbox patterns, or 2PC in application code
  • Schema enforcement disappears — bad data enters silently; data quality bugs are discovered at query time
  • Operational complexity increases — Cassandra cluster management, rebalancing, compaction tuning is non-trivial

PostgreSQL with proper indexing, connection pooling, and read replicas handles most systems that claim to need NoSQL scale. Instagram ran on PostgreSQL at 1 billion users. GitHub’s primary data store is MySQL. Shopify runs on MySQL with Vitess.

Choose NoSQL when you have a specific, identified access pattern that SQL cannot serve efficiently — not because you anticipate scale.

Polyglot Persistence

The answer to “which database?” in a real system is almost always “multiple databases.” Each database type is used for the workload it excels at.

Example: E-commerce platform

┌─────────────────────────────────────────────────────────────────┐
│  Access pattern          │  Database           │  Why           │
├─────────────────────────────────────────────────────────────────┤
│  Orders, payments,       │  PostgreSQL         │  ACID,         │
│  user accounts           │  (primary store)    │  joins, schema │
├─────────────────────────────────────────────────────────────────┤
│  Product catalog         │  MongoDB            │  Variable      │
│  (variable attributes)   │                     │  schema        │
├─────────────────────────────────────────────────────────────────┤
│  Product search,         │  Elasticsearch      │  Full-text,    │
│  autocomplete            │                     │  faceted filters│
├─────────────────────────────────────────────────────────────────┤
│  Session, cart,          │  Redis              │  Sub-ms,       │
│  rate limits, cache      │                     │  TTL, counters │
├─────────────────────────────────────────────────────────────────┤
│  Clickstream events,     │  ClickHouse         │  Columnar,     │
│  analytics               │                     │  aggregations  │
├─────────────────────────────────────────────────────────────────┤
│  Product images, assets  │  S3 / Object store  │  Blob storage  │
└─────────────────────────────────────────────────────────────────┘

Data synchronization: The source of truth is PostgreSQL. Other stores are derived views — kept in sync via:

  • Change Data Capture (CDC): Debezium reads PostgreSQL WAL → publishes events to Kafka → consumers update Elasticsearch, ClickHouse
  • Dual writes (avoid): Writing to two systems in one transaction without distributed coordination risks inconsistency on partial failure
  • Event sourcing: All state changes are events; any database is materialized from the event log

Decision Framework

Work through these layers in order. Stop at the first layer where a clear answer emerges.

Is this a caching, session, or ephemeral data problem?

Key-Value store (Redis, Memcached). No further evaluation needed.

Does the data require full-text search as the primary access pattern?

Search engine (Elasticsearch, Typesense). Feed from a primary store via CDC.

Are relationships between entities the primary query pattern (multi-hop graph traversal)?

Graph database (Neo4j, Neptune). Otherwise, continue.

Is write throughput the primary constraint — millions of writes per second, append-only, time-ordered?

Wide-column store (Cassandra, Bigtable). Schema must be designed per query. Continue if you need transactions.

Is the schema highly variable or hierarchical with optional nested fields, and are queries always by document ID or indexed field?

Document store (MongoDB, Firestore). Continue if you need cross-document transactions.

Do you need ACID transactions, complex joins, or ad-hoc queries?

SQL (PostgreSQL, MySQL). Scale with read replicas → connection pooling → sharding (Vitess/Citus) → NewSQL if you need global distribution.

Do you need multi-entity ACID + horizontal write scale across regions?

NewSQL (CockroachDB, Spanner, TiDB).

Futuristic and Emerging Patterns

The decision framework above covers systems built today. These categories are becoming relevant for the next generation of systems:

CategoryWhat it isWhen to reach for it
Vector databases (Pinecone, Weaviate, Qdrant, pgvector)Index high-dimensional embedding vectors; nearest-neighbor search in embedding spaceAI/ML similarity search — semantic search, recommendation via embeddings, RAG (Retrieval-Augmented Generation) pipelines
Multi-model databases (SurrealDB, ArangoDB, Fauna)One database serving document, graph, key-value, and relational modelsSystems that span multiple access patterns without wanting to run multiple databases
Edge / embedded databases (Cloudflare D1, Turso, SQLite)SQLite-compatible, deployed at the edge (CDN PoP) close to the userUltra-low-latency reads for personalized, geo-specific data; edge computing workloads
Lakehouse (Delta Lake, Apache Iceberg, Hudi)ACID transactions on top of columnar files in object storage (S3)Unifying the data lake (cheap storage) with data warehouse query capabilities; time travel, schema evolution
HTAP databases (TiDB, SingleStore, Databricks)Hybrid transactional + analytical processing on the same dataEliminating the CDC pipeline between OLTP and OLAP; real-time analytics on live transactional data
AI-native storage (purpose-built for LLM training)Optimized for the access patterns of model checkpointing, feature stores, and training data pipelinesML platforms, LLM training infrastructure
ℹ️

In a system design interview, mentioning vector databases for AI-powered features (semantic search, recommendations using embeddings) signals awareness of modern architectures. Pinecone and Weaviate are managed; pgvector (PostgreSQL extension) is the pragmatic choice if you are already on PostgreSQL and embedding count is in the millions, not billions.

ℹ️

Interview tip: When asked “SQL or NoSQL?”, I’d refuse to answer until I’ve enumerated the access patterns: “What’s the read shape — point lookup, range, full-text, aggregation? What’s the relationship structure? Is the schema stable? Do we need multi-entity ACID?” Only then would I pick. I’d push back hard on “we chose NoSQL because it scales” — Instagram ran on PostgreSQL at a billion users, GitHub on MySQL — and I’d default to SQL unless a specific access pattern (high-write time-series, full-text, key-value caching) makes it unworkable. Real systems are polyglot: PostgreSQL for transactions, Elasticsearch for search, Redis for sessions, ClickHouse for analytics — kept in sync via CDC, not dual writes.

Test Your Understanding

A team picks MongoDB because ‘we don’t know the schema yet and NoSQL is schemaless.’ Why is this reasoning flawed?

“Schemaless” is a myth. MongoDB doesn’t enforce schema at the database level, but your application code always assumes a structure — it expects fields to exist, have specific types, and follow conventions. The schema is just moved from the database to the application, where it’s harder to enforce, version, and validate.

The real trade-off: Schema-on-write (SQL) catches errors at insert time. Schema-on-read (document stores) catches errors when the application reads the data — which may be months later in a different code path. Schema flexibility is useful when the data is genuinely polymorphic (varying attributes per document, like product catalogs), not when the team hasn’t designed the data model yet.

Better approach: Start with PostgreSQL + JSONB columns for the parts of the schema that are truly variable. You get ACID transactions, SQL joins, and schema flexibility where you need it.

Your system uses PostgreSQL for orders and Elasticsearch for search. An order is created but doesn’t appear in search for 30 seconds. What’s wrong with using dual writes to fix this, and what should you use instead?

Dual writes are unsafe. If you write to PostgreSQL and then to Elasticsearch, and the ES write fails (network error, ES down), the data is inconsistent — the order exists in PG but not in search. You can’t wrap both in a transaction because they’re different systems.

Even if both writes succeed, there’s no ordering guarantee. Under concurrent updates, PG might have version 3 while ES has version 2 — permanent inconsistency.

Fix: CDC (Change Data Capture). Use Debezium (an open-source tool that reads the database’s WAL/binlog) to capture every committed change from PostgreSQL and publish it to Kafka. An ES sink connector consumes from Kafka and updates Elasticsearch. This guarantees: (1) only committed data reaches ES, (2) ordering is preserved via the WAL, (3) if ES is down, events queue in Kafka and are applied when it recovers.

An architect proposes using Cassandra for a financial ledger because ‘it scales horizontally.’ What critical requirement does Cassandra lack for this use case?

Multi-row ACID transactions. A financial ledger requires that a debit and its corresponding credit are atomically committed (double-entry bookkeeping). Cassandra offers lightweight transactions (LWT) via Paxos, but only for single-partition conditional writes — not multi-partition atomic operations.

If the debit and credit are on different partition keys (different accounts), Cassandra cannot guarantee both succeed or both fail. A crash between the two writes leaves the ledger imbalanced.

Better choice: PostgreSQL (or CockroachDB/Spanner for global scale) for the ledger — ACID guarantees correctness. Use Cassandra for the access patterns it excels at: high-write, append-only, query-by-partition-key workloads like activity feeds, time-series metrics, or messaging.

A startup uses DynamoDB for everything — user profiles, orders, analytics. At Series B scale, engineers complain about expensive scatter-gather queries and inability to do ad-hoc reporting. What went wrong?

Access pattern mismatch. DynamoDB excels at known, simple access patterns — point lookups and range queries by partition key. But:

  1. Ad-hoc queries (“show me all orders over $100 in the last week grouped by region”) require full table scans, which are slow and expensive on DynamoDB.
  2. Cross-partition joins don’t exist — every query that touches multiple partition keys becomes a scatter-gather in application code.
  3. Analytics (aggregations, GROUP BY) must be done client-side after scanning, which is orders of magnitude slower than a columnar store.

The polyglot fix: Keep DynamoDB for what it’s good at (user sessions, high-throughput key-value access). Add PostgreSQL for transactional data that needs joins and ad-hoc queries. Add a columnar store (Athena, ClickHouse) for analytics. Sync via CDC.

Lesson: “NoSQL scales” is true for the access patterns it was designed for. Using it for access patterns it wasn’t designed for creates a distributed monolith — the worst of both worlds.