Deep Dive

1. Why Introduce a Message Broker (Kafka)? Is it Just for Asynchronicity?

You are absolutely right. Maintaining a Kafka cluster is a non-trivial operational task. It adds complexity to the system. The decision to include it is a classic architectural trade-off between operational simplicity and system resilience/scalability.

While the primary reason is indeed to make the update asynchronous, the benefits go far beyond just reducing the latency of the Match Service.

Decoupling and Resilience: This is the most critical advantage here. Imagine the Redis instance slows down or becomes temporarily unavailable.
- Without Kafka: The Match Service would try to write directly to Redis. If Redis is slow, the API call to the Match Service would hang, leading to a poor user experience. If Redis is down, the score update would fail entirely, and the win would be lost forever unless complex retry logic is built into the Match Service itself.
- With Kafka: The Match Service’s only job is to fire and forget a tiny message to Kafka. This is an extremely fast and reliable operation. Kafka acts as a durable buffer. If the Score Update Service or Redis is down, the messages (match wins) simply accumulate safely in the Kafka topic. Once the downstream systems recover, the consumer catches up and processes the backlog. This prevents data loss and isolates failures, making the entire system more resilient.
Load Balancing and Backpressure Handling: Our system anticipates 5 million daily users playing 10 matches, resulting in up to 50 million match results per day. This can lead to very “spiky” traffic patterns (e.g., more matches at night).
- Without Kafka: A sudden burst of wins would hammer Redis directly. While Redis is fast, it can still be overwhelmed.
- With Kafka: Kafka naturally absorbs these bursts. The consumer (Score Update Service) can then process these messages from the queue at a steady, manageable pace that Redis can handle, effectively smoothing out the load and preventing the database from being overwhelmed. This is a form of backpressure handling.
Scalability of Consumers: The consumer’s job is simple: read a message and run a ZINCRBY command. By using Kafka, we can run multiple instances of the Score Update Service in a consumer group. If we partition our Kafka topic (e.g., by user_id), each consumer instance can handle a subset of the updates in parallel, allowing us to scale the score update throughput horizontally.

Conclusion: Yes, maintaining Kafka adds overhead. However, for a system with 25 million MAU and a “real-time” requirement, the resilience, decoupling, and scalability benefits it provides far outweigh the operational complexity. It turns an unreliable, synchronous update process into a durable, asynchronous, and scalable one.

2. How Do You Maintain the Order of Updates?

This is an extremely important question, especially in systems where the sequence of events matters.

In Kafka, order is guaranteed within a partition. It does not guarantee order across different partitions of a topic.

Here’s how we leverage this for our use case:

Partitioning Strategy: When the Match Service produces a message to the Kafka topic, it should specify a partition key. The ideal partition key here is the user_id. Kafka’s producer will ensure that all messages with the same user_id are always sent to the same partition.
Per-User Ordering: Because all updates for a single user (e.g., “User A wins a match”) go to the same partition, and a partition is consumed by only one consumer within a group at any given time, the order of updates for that specific user is guaranteed. User A’s 5th win will never be processed before their 4th win.

What about ordering between different users? The system does not guarantee that an update for “User A” that happened at T1 will be processed before an update for “User B” that happened at T2 (where T2 > T1). They might go to different partitions and be processed by different consumers.

Why is this acceptable for our Leaderboard? For a simple “total score” leaderboard, the absolute order of updates between different users doesn’t matter. The ZINCRBY operation in Redis is atomic. It increments the current value. Whether User A’s score is incremented before or after User B’s score has no impact on their final, correct totals. The end result is the same regardless of the inter-user processing order.

If we were implementing the tie-breaking mechanism (using a timestamp), the timestamp would be generated by the Match Service when the win occurs. This timestamp would be part of the Kafka message and used in the score calculation. So, even if events are processed out of chronological order, the timestamp ensures the “who got there first” logic is preserved correctly.

3. How Does Redis Handle Deadlocks and Consistency?

This is a great question that dives into the internals of Redis.

No Deadlocks: Redis is, by its core design, single-threaded. It uses an event loop model to handle client requests. This means it processes one command at a time. While one command is executing, no other command can run. This single-threaded nature completely eliminates the possibility of deadlocks on the Redis server itself, as two commands can never be in a state of waiting for each other to release a lock.
Atomicity and Consistency:
- Atomic Commands: Most individual Redis commands, like ZINCRBY (increment score for a member in a sorted set), are atomic. When the Score Update Service sends a ZINCRBY leaderboard:2025-07 1 user_123 command, Redis guarantees that this entire operation will complete without any other command interrupting it. You will never have a “partial” increment or a race condition at the command level.
- Transactions (MULTI/EXEC): For operations that require modifying multiple keys atomically, Redis provides transactions using MULTI and EXEC. However, for our primary use case of incrementing a score, the atomic ZINCRBY command is sufficient and more performant.
- Consistency: In a single Redis instance, consistency is strong (linearizable). In a distributed setup (which we discuss next), the consistency model becomes more complex, typically “eventual consistency” in the replicas. However, writes are always directed to the primary node, which processes them serially, maintaining the integrity of the data.

4. Handling Redis Scalability and Resiliency

You correctly identified that Redis is the heart of the system and a potential single point of failure and bottleneck. We cannot rely on a single Redis instance in production for a system of this scale.

Here’s how we address this:

Resiliency and High Availability (Handling Failures)

We use Redis Sentinel.

What it is: Redis Sentinel is a separate system that provides high availability for Redis. You run several Sentinel processes alongside your Redis primary and replica instances.
How it works:
1. Monitoring: Sentinels constantly monitor the health of the primary Redis instance.
2. Automatic Failover: If the Sentinels agree that the primary is down, they will automatically perform a failover. They will elect one of the replica instances, promote it to be the new primary, and reconfigure the other replicas to follow the new primary.
3. Service Discovery: Your application services (Score Update Service, Leaderboard Service) don’t connect directly to the Redis IP. Instead, they connect to the Sentinels and ask, “Who is the current primary for the leaderboard?” The Sentinels provide the correct address. This way, the application automatically adapts to the failover without manual intervention.

This setup transforms our single point of failure into a highly available, self-healing cluster.

Scalability (Handling Load and Data Size)

We use Redis Cluster. While a single large Redis node can handle a lot, for 25 million users, the data size and request load might exceed the capacity of a single machine.

What it is: Redis Cluster provides a way to run a Redis installation where data is automatically sharded (partitioned) across multiple Redis nodes.
How it works:
1. Sharding: The entire key space is split into 16,384 “hash slots.” Each primary node in the cluster is responsible for a subset of these slots.
2. Key-based Routing: When a client wants to run a command (e.g., ZINCRBY on leaderboard:2025-07), Redis Cluster applies a hash function (CRC16) to the key to determine which hash slot it belongs to. The client is then directed to the specific node that owns that slot. Most modern Redis clients handle this redirection automatically.
3. Limitation: This works perfectly for our use case because our primary data structure is a single sorted set per month (leaderboard:2025-07). A single key (the sorted set) will reside on a single node. This means one node must be large enough to hold the entire 25M-entry sorted set. However, it allows us to store other data (e.g., user session caches) on different nodes, distributing the overall load.

Combined Approach: In a production-grade setup, you would use Redis Cluster, and each primary node within that cluster would have its own replica for high availability, combining both sharding for scalability and replication for resiliency.

5. Does the Sorted Set Store All 25 Million Users?

Yes, for the monthly leaderboard, the length of the sorted set (leaderboard:YYYY-MM) will grow to hold all 25 million monthly active users who have a score greater than zero.

Is this a problem? No, not for Redis.

Memory Footprint: Let’s do a rough calculation. A sorted set entry consists of the member (user_id) and the score.
- user_id: Let’s assume it’s a UUID string (36 bytes) or a long integer (8 bytes). Let’s take an average of 20 bytes.
- score: An integer (8 bytes).
- Overhead: Redis has some overhead per entry for the data structure itself (pointers, metadata). Let’s estimate a total of ~48-64 bytes per user.
- Total Size: 25,000,000 users * 64 bytes/user ≈ 1,600,000,000 bytes ≈ 1.6 GB. This is a very manageable size for a modern server. A server with 8GB or 16GB of RAM can easily hold this sorted set in memory with plenty of room to spare for growth, other data, and operational overhead. The cost of this much RAM is negligible compared to the performance gains.
Performance: The efficiency of Redis Sorted Sets comes from their underlying implementation, which is a combination of a hash table (for O(1) member lookups) and a skip list (a probabilistic data structure that allows for O(log N) operations like ranking and range queries). Even with N = 25 million, log2(25,000,000) is approximately 25. The performance of getting the top 10 or getting a user’s rank remains extremely fast and is not a concern.

High Level Design