Distributed Unique ID Generator
The goal is to design a Distributed ID Generation System that can generate globally unique IDs across multiple services and datacenters without coordination. This system should be highly available, fault-tolerant, and able to generate IDs at high throughput. Twitter’s Snowflake ID is a popular example, but we will generalize its concept.
Requirements:
- Uniqueness: The ID should be globally unique.
- Scalability: The system should scale across multiple servers and datacenters.
- High throughput: The system should be able to generate a large number of IDs per second.
- Time-ordering: IDs should be roughly ordered based on the time they were generated.
- Fault-tolerance: The system should be able to tolerate failures of individual servers.
ID Structure:
We will structure the ID as a 64-bit integer, which consists of:
- Timestamp (in milliseconds): 41 bits (enough for ~69 years from the start time)
- Datacenter ID: 5 bits (32 datacenters)
- Machine ID (or Worker ID): 5 bits (32 machines per datacenter)
- Sequence number: 12 bits (4096 IDs per millisecond per machine)
Example ID Structure:
| 41 bits (timestamp) | 5 bits (datacenter ID) | 5 bits (machine ID) | 12 bits (sequence) |
Object-Oriented Design:
1. IDGenerator Class:
This is the main class responsible for generating unique IDs. It combines the timestamp, datacenter ID, machine ID, and sequence to form a unique ID.
import time
from threading import Lock
class IDGenerator:
def __init__(self, datacenter_id: int, machine_id: int, epoch: int):
self.datacenter_id = datacenter_id
self.machine_id = machine_id
self.epoch = epoch # Custom epoch, in milliseconds
# Bit allocations
self.timestamp_bits = 41
self.datacenter_bits = 5
self.machine_bits = 5
self.sequence_bits = 12
# Maximum values
self.max_datacenter_id = (1 << self.datacenter_bits) - 1
self.max_machine_id = (1 << self.machine_bits) - 1
self.max_sequence = (1 << self.sequence_bits) - 1
# Shifts for each component
self.datacenter_shift = self.sequence_bits + self.machine_bits
self.machine_shift = self.sequence_bits
self.timestamp_shift = self.sequence_bits + self.machine_bits + self.datacenter_bits
# State
self.sequence = 0
self.last_timestamp = -1
self.lock = Lock()
# Validation
if not (0 <= datacenter_id <= self.max_datacenter_id):
raise ValueError(f"Datacenter ID must be between 0 and {self.max_datacenter_id}")
if not (0 <= machine_id <= self.max_machine_id):
raise ValueError(f"Machine ID must be between 0 and {self.max_machine_id}")
def _current_timestamp(self):
return int(time.time() * 1000) # Current time in milliseconds
def _wait_for_next_millis(self, last_timestamp):
# Wait for the next millisecond to avoid duplicate timestamps
timestamp = self._current_timestamp()
while timestamp <= last_timestamp:
timestamp = self._current_timestamp()
return timestamp
def generate_id(self) -> int:
with self.lock:
timestamp = self._current_timestamp()
if timestamp < self.last_timestamp:
raise Exception("Clock moved backwards. Unable to generate ID.")
if timestamp == self.last_timestamp:
# Increment the sequence number
self.sequence = (self.sequence + 1) & self.max_sequence
if self.sequence == 0:
# Sequence overflow, wait for the next millisecond
timestamp = self._wait_for_next_millis(self.last_timestamp)
else:
# Reset sequence for a new millisecond
self.sequence = 0
self.last_timestamp = timestamp
# Generate the ID
id_value = ((timestamp - self.epoch) << self.timestamp_shift) | \
(self.datacenter_id << self.datacenter_shift) | \
(self.machine_id << self.machine_shift) | \
self.sequence
return id_value
2. Datacenter Class:
Represents a datacenter that manages multiple workers (machines). Each datacenter is assigned a unique ID.
class Datacenter:
def __init__(self, datacenter_id: int, workers: int, epoch: int):
self.datacenter_id = datacenter_id
self.workers = [IDGenerator(datacenter_id, machine_id, epoch) for machine_id in range(workers)]
def get_worker(self, machine_id: int) -> IDGenerator:
if machine_id < 0 or machine_id >= len(self.workers):
raise ValueError("Invalid machine ID")
return self.workers[machine_id]
3. System Class:
This class manages multiple datacenters and facilitates the distribution of ID generators to various clients.
class IDSystem:
def __init__(self, datacenters: int, workers_per_datacenter: int, epoch: int):
self.datacenters = [Datacenter(datacenter_id, workers_per_datacenter, epoch) for datacenter_id in range(datacenters)]
def get_generator(self, datacenter_id: int, machine_id: int) -> IDGenerator:
if datacenter_id < 0 or datacenter_id >= len(self.datacenters):
raise ValueError("Invalid datacenter ID")
return self.datacenters[datacenter_id].get_worker(machine_id)
Key Classes and Their Responsibilities:
IDGenerator
: Generates a globally unique ID based on the timestamp, datacenter ID, machine ID, and sequence number. It ensures that no two IDs generated by the same machine in the same millisecond are the same.Datacenter
: Represents a datacenter containing multiple machines. It manages the ID generation on those machines.IDSystem
: Represents the entire distributed ID generation system. It manages multiple datacenters and facilitates the retrieval of ID generators for different machines.
Design Details:
- Thread Safety: The
generate_id
method in theIDGenerator
class is thread-safe using a lock to ensure no race conditions when multiple threads try to generate an ID concurrently. - Fault-tolerance: Since each machine independently generates IDs, even if some machines fail, others continue to function without coordination.
- Scalability: This design scales horizontally by adding more datacenters and machines. The system supports up to 32 datacenters and 32 machines per datacenter.
- Time-ordering: The 41-bit timestamp ensures that IDs are roughly time-ordered.
Usage Example:
if __name__ == "__main__":
epoch = 1609459200000 # Custom epoch (Jan 1, 2021, 00:00:00 UTC)
# Initialize a system with 3 datacenters, each having 5 machines
system = IDSystem(datacenters=3, workers_per_datacenter=5, epoch=epoch)
# Get the ID generator for datacenter 1, machine 2
id_generator = system.get_generator(datacenter_id=1, machine_id=2)
# Generate an ID
unique_id = id_generator.generate_id()
print(f"Generated ID: {unique_id}")
Further Considerations:
- Clock synchronization: If clocks across machines are not synchronized, we may need mechanisms to detect and handle clock drifts (e.g., retries or system-wide time synchronization like NTP).
- Scaling limits: With 5 bits for datacenter and machine IDs, the system is limited to 32 datacenters and 32 machines per datacenter. If more machines/datacenters are required, the bit allocation can be adjusted (e.g., reducing sequence bits).
This design is based on Snowflake’s principles but is generalized for broader use cases.