Hash Index
A hash index maps each key through a hash function to a bucket, then stores a pointer to the row in that bucket. Lookups are O(1) average — there is no tree to traverse. The tradeoff is absolute: a hash index answers only equality predicates. It cannot range-scan, sort, or match prefixes. Database Indexes covers when to choose one over a B+ tree; this file covers how it works and where it appears in real systems.
Mechanics
Index on: users.email
hash("alice@example.com") → bucket 3 → [row_ptr: heap page 41, offset 7]
hash("bob@example.com") → bucket 7 → [row_ptr: heap page 12, offset 2]
hash("carol@example.com") → bucket 3 → [row_ptr: heap page 88, offset 0]
↑
collision — bucket 3 holds a chainHash function: maps an arbitrary-length key to a fixed-size integer (the bucket index). A good hash function distributes keys uniformly across buckets, minimizing collisions.
Collision resolution: Two approaches are common in databases:
| Method | How | Used by |
|---|---|---|
| Separate chaining | Each bucket holds a linked list of entries; a collision appends to the list | PostgreSQL hash index |
| Open addressing / linear probing | On collision, probe the next empty bucket in sequence | In-memory hash tables, InnoDB AHI |
Lookup:
- Hash the query key → bucket number
- Read the bucket → get one or more (key, row_ptr) pairs
- Compare keys for equality (needed because different keys can hash to the same bucket)
- Follow the matching row pointer to the heap page
Average O(1), worst-case O(n): With a uniform hash function and load factor < 0.75, collisions are rare and chaining stays short. A pathological key distribution (many keys hashing to the same bucket) degrades to O(n) — this is why hash function quality matters.
What Hash Index Supports
| Query | Hash index? | Why |
|---|---|---|
WHERE email = 'alice@example.com' | ✅ | Single hash computation → O(1) bucket lookup |
WHERE id IN (1, 5, 9) | ✅ | Three separate hash lookups |
WHERE age > 30 | ❌ | Hash output has no ordering — cannot find “greater than” |
WHERE age BETWEEN 20 AND 30 | ❌ | Range requires ordering |
WHERE name LIKE 'ali%' | ❌ | Prefix requires knowing hash of all matching keys upfront |
ORDER BY email | ❌ | Hash output is not sorted |
WHERE email IS NULL | ❌ (PostgreSQL) | NULL is not hashed into the index |
The inability to support range queries or ordering is not a limitation that can be engineered around — it is fundamental to how hashing works. Hashing intentionally destroys key ordering to achieve uniform distribution. If you need ordering, you need a B+ tree.
B+ Tree vs Hash Index
| B+ Tree | Hash Index | |
|---|---|---|
| Equality lookup | O(log n) | O(1) average |
| Range query | ✅ | ❌ |
| ORDER BY | ✅ | ❌ |
| Prefix LIKE | ✅ | ❌ |
| Composite index | ✅ (leftmost prefix rule) | ❌ (hashes the whole key) |
| Index size | Larger (internal nodes + leaf nodes) | Smaller (flat bucket array + chains) |
| Write cost | Moderate (B+ tree node updates, possible splits) | Low (hash + append to chain) |
| On-disk support | ✅ (default everywhere) | ✅ PostgreSQL; limited elsewhere |
| In-memory performance | Fast | Faster |
The practical conclusion: B+ tree’s O(log n) vs hash’s O(1) is not a meaningful difference in practice — a 3-level B+ tree with a 400-key fan-out reaches any of 64 million rows in 3 I/Os. The real reason to choose a hash index is workload-specific: pure equality joins on a high-cardinality key where range queries, sorting, and prefix filters are never needed.
PostgreSQL Hash Index
PostgreSQL has had hash indexes since version 7.x, but they were not WAL-logged until PostgreSQL 10 (2017). Before v10, a hash index had to be manually rebuilt after a crash — making them unsafe for production. Since v10, they are fully durable and usable.
-- Create a hash index
CREATE INDEX idx_users_email_hash
ON users USING HASH (email);
-- The query optimizer will use it for equality predicates:
SELECT * FROM users WHERE email = 'alice@example.com';
-- → Index Scan using idx_users_email_hash (cost is lower than B-tree for pure equality)
-- The optimizer will NOT use it for:
SELECT * FROM users WHERE email > 'alice@example.com'; -- range → B-tree fallback
SELECT * FROM users ORDER BY email; -- sort → seq scan or B-treePostgreSQL hash index internals — overflow pages:
PostgreSQL implements the hash index as a file of fixed-size 8 KB pages, divided into primary bucket pages and overflow pages. When a bucket fills up, an overflow page is chained to it. Unlike a B-tree, there are no internal routing nodes — the bucket number is computed directly from the hash value and the current number of buckets.
Splits: When the load factor exceeds a threshold, the index doubles its bucket count (a split). During a split, half the entries in the overfull bucket are redistributed to a new bucket. This is online and page-level — it does not lock the entire index.
PostgreSQL’s query planner rarely chooses a hash index over a B-tree unless the table is large, the column has very high cardinality, and the workload is strictly equality-only. In practice, B-tree handles equality fast enough that the hash index advantage is marginal except at very high QPS on in-memory datasets.
InnoDB Adaptive Hash Index (AHI)
MySQL InnoDB does not let you create hash indexes on user tables. Instead, it builds one automatically and internally — the Adaptive Hash Index (AHI).
InnoDB Buffer Pool
┌────────────────────────────────────────────────────────┐
│ B+ Tree leaf pages (data) │
│ B+ Tree internal pages (routing) │
│ │
│ Adaptive Hash Index │
│ ┌──────────────────────────────────────────────┐ │
│ │ hash(index_key) → pointer to B+ tree leaf │ │
│ │ page already in buffer pool│ │
│ └──────────────────────────────────────────────┘ │
│ (built and destroyed automatically) │
└────────────────────────────────────────────────────────┘InnoDB observes which B+ tree leaf pages are accessed repeatedly. For frequently accessed key patterns, it builds an in-memory hash table mapping that key directly to the leaf page that is already in the buffer pool. On the next identical lookup, the engine skips B+ tree traversal entirely and jumps straight to the leaf page.
Key characteristics:
- Enabled by default (
innodb_adaptive_hash_index = ON) - Entirely in-memory — not persisted; rebuilt from buffer pool contents on restart
- Partitioned into
innodb_adaptive_hash_index_partsshards (default 8) to reduce lock contention - Automatically disabled for key patterns that do not benefit
- Can be disabled completely:
SET GLOBAL innodb_adaptive_hash_index = OFF
When AHI hurts: On workloads with high key variety (sequential scans, many distinct keys), the AHI hash table sees constant churn — entries are added and evicted before they are reused. The overhead of maintaining the AHI exceeds its benefit. A symptom is high contention on rw_lock_x_lock: btr_search_latch in SHOW ENGINE INNODB STATUS.
The AHI is a buffer-pool-level optimization, not a storage-level index. It accelerates access to pages that are already in memory. It provides no benefit for cold data that must be read from disk — a B+ tree traversal is always required on a cold read.
Bitcask: Hash Indexes in a Storage Engine
Bitcask (the default storage engine in Riak) demonstrates what a hash index looks like as the entire storage engine architecture — not just an auxiliary index on a table.
Design:
- All writes are append-only to a sequential log file (active file). Random I/O is eliminated entirely.
- An in-memory hash table maps every key →
{file_id, byte_offset, value_size}. - Reads: hash the key → get file+offset → one disk seek → read value.
Write("k1", "v1"): Active log file In-memory keydir
Append to log → [k1|v1][k2|v2][k1|v3] {k1 → (file=1, offset=8)}
Update keydir {k2 → (file=1, offset=2)}
Read("k1"):
keydir["k1"] → file=1, offset=8
pread(file=1, offset=8, len=...) → "v3"Compaction: Log files grow without bound. A background process rewrites each segment, keeping only the latest value per key and discarding all older versions. After compaction, the old files are deleted and the keydir is updated.
What Bitcask gives you:
- Write throughput limited only by sequential disk I/O (near disk saturation)
- O(1) reads with exactly one disk seek (assuming keydir fits in RAM)
- Simple crash recovery: re-scan the log, or use the stored keydir snapshot (
hint files)
Bitcask’s hard constraint: The entire keydir must fit in RAM. If you have more distinct keys than fit in memory, Bitcask is not usable regardless of how much disk you have. This is not a tunable tradeoff — it is the architectural price of O(1) in-memory hash lookups.
| Bitcask | LSM Tree | B+ Tree | |
|---|---|---|---|
| Write pattern | Sequential append | Sequential (memtable → SSTable) | Random in-place page write |
| Read cost | 1 disk seek (with hint file) | Multiple SSTable checks | O(log n) tree traversal |
| Range queries | ❌ | ✅ (within SSTable) | ✅ (linked leaf list) |
| RAM requirement | All keys must fit | Configurable (memtable + Bloom filters) | OS page cache (partial) |
| Used by | Riak, simple KV engines | Cassandra, RocksDB, LevelDB | PostgreSQL, MySQL, Oracle |
When to Use a Hash Index
| Scenario | Use hash index? |
|---|---|
| Pure equality lookups on a UUID / token / email column, no range needed | ✅ |
| Very high QPS in-memory workload where O(1) vs O(log n) matters | ✅ |
| Any column that is also ORDER BY, BETWEEN, or LIKE queried | ❌ |
Composite lookup (WHERE a = ? AND b = ?) — need leftmost prefix flexibility | ❌ (use B+ tree composite) |
| Default general-purpose index | ❌ (use B+ tree) |
The default for any new index should be a B+ tree. Reach for a hash index only when profiling confirms that equality-only lookups on a specific high-cardinality column are a measured bottleneck, and range/sort queries on that column will never exist.