Presence & Online Status

Your chat app shows a green dot next to each contact who is currently online. User A opens the app, sees 47 friends online, and starts a conversation. Meanwhile, user B’s phone loses signal in an elevator — 30 seconds later, their dot turns gray. This seems simple until you realize: 500 million users, each with hundreds of contacts, and the system must detect presence changes within seconds while not drowning the infrastructure in unnecessary updates.

Heartbeat-Based Presence Detection

The most common approach: each connected client sends a lightweight heartbeat message to the server at a fixed interval. If the server stops receiving heartbeats, it marks the user as offline after a grace period.

Client sends heartbeat every 5 seconds.
Server expects heartbeat within 15 seconds (3 missed heartbeats = offline).

Timeline:
  t=0    heartbeat ✓   → online
  t=5    heartbeat ✓   → online
  t=10   heartbeat ✓   → online
  t=15   (missed)      → still online (grace period)
  t=20   (missed)      → still online (grace period)
  t=25   (missed)      → OFFLINE (3 missed = 15s since last heartbeat)

Why Not Just Use WebSocket Connection State?

A WebSocket disconnect event seems like a natural signal. The moment the TCP connection drops, the server knows the user is gone. But in practice:

SignalProblem
TCP FIN receivedWorks for clean disconnects (user closes app). Does not work for abrupt network loss — TCP keepalive timeout is minutes, not seconds.
TCP RST receivedOnly if the OS sends a reset. Many mobile disconnects leave the TCP state dangling on the server for 30–120 seconds.
WebSocket close frameOnly sent on graceful shutdown. Crash, kill, or network loss = no close frame.

Heartbeats solve all of these: if the client stops heartbeating for any reason — app crash, network loss, battery death — the server detects absence within the grace period.

Heartbeat Interval Trade-offs

IntervalGrace period (3×)Detection speedOverhead per user
3s9sFastHigh (0.33 msg/s)
5s15sModerateModerate (0.2 msg/s)
10s30sSlowLow (0.1 msg/s)
30s90sVery slowMinimal (0.03 msg/s)

At 500M concurrent users with a 5s heartbeat: 100 million heartbeats per second. This is a significant load. The heartbeat must be as lightweight as possible — a single byte or a WebSocket ping frame, not a full JSON payload.

import time
import asyncio

class PresenceTracker:
    """Server-side presence tracking using heartbeat timestamps."""

    def __init__(self, redis, heartbeat_timeout=15):
        self.redis = redis
        self.heartbeat_timeout = heartbeat_timeout

    async def heartbeat(self, user_id: str):
        """Record heartbeat — set key with TTL so it auto-expires."""
        key = f"presence:{user_id}"
        now = int(time.time())
        pipe = self.redis.pipeline()
        pipe.set(key, now)
        pipe.expire(key, self.heartbeat_timeout)
        await pipe.execute()

    async def is_online(self, user_id: str) -> bool:
        """Check if user has a non-expired presence key."""
        return await self.redis.exists(f"presence:{user_id}") == 1

    async def get_last_seen(self, user_id: str) -> int | None:
        """Return last heartbeat timestamp, or None if expired."""
        val = await self.redis.get(f"presence:{user_id}")
        return int(val) if val else None

The key insight: Redis TTL is the heartbeat timeout. If the user heartbeats within the TTL window, the key gets refreshed. If not, Redis automatically deletes the key — no background cleanup job needed. Checking if a user is online is a single EXISTS call — O(1).

Last-Seen Storage

Not every application needs real-time “green dot” presence. Many show “last seen 5 minutes ago” — a simpler problem that requires only storing the most recent activity timestamp.

# Redis hash: single key, one field per user
# More memory-efficient than individual keys for large user sets
await redis.hset("last_seen", user_id, int(time.time()))

# Read last seen
ts = await redis.hget("last_seen", user_id)
last_seen = datetime.fromtimestamp(int(ts)) if ts else None
StorageProsCons
Redis key per user (presence:{user_id})TTL-based auto-expiry, O(1) lookupMemory overhead per key (50–80 bytes overhead each)
Redis hash (last_seen → user_id → timestamp)Compact for millions of users, single keyNo per-field TTL — need cleanup job for stale entries
Database column (users.last_seen_at)Persistent, queryable, no extra infraToo slow for real-time heartbeat writes (disk I/O per heartbeat)

Production pattern: Use Redis key-per-user with TTL for real-time presence detection, and asynchronously update a last_seen_at column in the database when the user goes offline (on TTL expiry via Redis keyspace notifications or a periodic sync job).

The Fan-Out Problem

When user A comes online, who needs to know? All of A’s contacts who are currently online and have the chat app open. For a user with 500 contacts, that’s potentially 500 notifications. For a celebrity with 10 million followers, it’s a storm.

Naive Approach: Broadcast to All Contacts

User A comes online
→ Fetch A's contact list: [B, C, D, E, ..., N]  (500 contacts)
→ For each contact:
    → Check if contact is online
    → If online, find their WebSocket connection server
    → Push "A is now online" notification

Problem: This is O(contacts) work per status change. At 500M users each changing status several times per day, the volume of fan-out messages is enormous. Worse: users who rapidly toggle between online/offline (flapping) cause amplified fan-out.

Practical Solution: Subscribe on View, Not Globally

Instead of pushing presence to all contacts at all times, only push presence updates to users who are actively viewing a screen that shows the contact’s status.

sequenceDiagram
    participant A as User A
    participant S as Presence Service
    participant PS as Pub/Sub Layer
    participant B as User B

    Note over B: B opens chat list
showing contacts including A B->>S: Subscribe to presence of [A, C, D, ...] S->>PS: Add B to channel "presence:A" Note over A: A opens the app → heartbeat starts A->>S: Heartbeat (online) S->>PS: Publish "A: online" to channel "presence:A" PS->>B: "A is now online" Note over B: B navigates away from chat list B->>S: Unsubscribe from presence of [A, C, D, ...] S->>PS: Remove B from channel "presence:A"

This limits fan-out to only the users who care right now — typically a small subset of any user’s contact list. Users who have the app in the background or are on a different screen don’t receive unnecessary updates.

Cross-Server Routing

With millions of WebSocket connections spread across hundreds of connection servers, user A’s status change on server 1 must reach user B on server 47. The presence service cannot maintain a direct mapping of every user to every server.

flowchart TB
    subgraph Connection Layer
        WS1[WebSocket Server 1
Users: A, X, Y] WS2[WebSocket Server 2
Users: B, Z] WS3[WebSocket Server 3
Users: C, D] end subgraph Pub/Sub Layer R1[Redis Pub/Sub
or Kafka] end subgraph Presence Service PS[Presence Tracker
Redis TTL keys] end WS1 -->|"A heartbeat"| PS PS -->|"publish: A online"| R1 R1 -->|"A online"| WS2 R1 -->|"A online"| WS3 WS2 -->|"deliver to B
(subscribed to A)"| WS2 WS3 -->|"deliver to C
(subscribed to A)"| WS3

How it works:

  1. Each WebSocket server subscribes to the pub/sub channels for the users whose presence its connected clients care about
  2. When the presence service detects a status change (online → offline or vice versa), it publishes to the user’s presence channel
  3. Only WebSocket servers with subscribers on that channel receive the message
  4. The receiving WebSocket server pushes the update to the specific connected clients

Redis Pub/Sub vs Kafka for Presence

CriteriaRedis Pub/SubKafka
LatencySub-millisecond5–50ms (batching)
DeliveryFire-and-forget (no persistence)Persistent log, replay possible
Missed messagesLost if subscriber is downConsumer reads from offset after recovery
ScaleSingle Redis node: ~1M messages/sPartitioned: millions of messages/s
Use for presenceGood — presence is ephemeral, loss is acceptableOverkill unless you need audit log of status changes

Redis Pub/Sub is the typical choice for presence: the data is ephemeral (a missed “online” notification is harmless — the client can poll on next screen load), and the latency is lower.

Flapping Prevention

A user on a flaky mobile connection might alternate between online and offline every few seconds. Without mitigation, this generates a storm of status change notifications.

Without debounce:
  t=0   online  → notify 200 subscribers
  t=3   offline → notify 200 subscribers
  t=5   online  → notify 200 subscribers
  t=8   offline → notify 200 subscribers
  ...
  = 800 notifications in 10 seconds for one user

Solution: Debounce offline transitions. When heartbeats stop, wait an extra grace period before declaring the user offline and notifying subscribers.

class DebouncedPresence:
    """Delay offline notification to prevent flapping."""

    def __init__(self, redis, offline_delay=30, heartbeat_timeout=15):
        self.redis = redis
        self.offline_delay = offline_delay
        self.heartbeat_timeout = heartbeat_timeout

    async def heartbeat(self, user_id: str):
        was_pending_offline = await self.redis.exists(
            f"pending_offline:{user_id}"
        )
        pipe = self.redis.pipeline()
        pipe.set(f"presence:{user_id}", int(time.time()))
        pipe.expire(f"presence:{user_id}", self.heartbeat_timeout)
        # Cancel any pending offline notification
        pipe.delete(f"pending_offline:{user_id}")
        await pipe.execute()

        if was_pending_offline:
            # User came back before offline was broadcast — no notification
            pass

    async def mark_pending_offline(self, user_id: str):
        """Called when heartbeat TTL expires (via keyspace notification).
        Schedule offline notification after delay."""
        await self.redis.set(
            f"pending_offline:{user_id}", 1, ex=self.offline_delay
        )
        # When pending_offline TTL expires → truly offline → notify subscribers

    async def confirm_offline(self, user_id: str):
        """Called when pending_offline TTL expires.
        User did not heartbeat during the delay — genuinely offline."""
        await self.publish_status_change(user_id, "offline")

The pattern: heartbeat TTL expires → set pending_offline with a 30-second TTL → if heartbeat resumes, delete pending_offline (no notification sent) → if pending_offline expires, publish offline status.

Scaling Presence to Hundreds of Millions

Architecture Overview

flowchart TB
    subgraph Clients
        C1([Mobile App])
        C2([Web App])
        C3([Desktop App])
    end

    subgraph Edge
        LB[Load Balancer
Sticky Sessions] end subgraph Connection Tier WS1[WS Server 1] WS2[WS Server 2] WSN[WS Server N] end subgraph Presence Tier PT[Presence Tracker
Redis Cluster] end subgraph Pub/Sub Tier PS[Redis Pub/Sub
Sharded by user_id] end C1 & C2 & C3 --> LB LB --> WS1 & WS2 & WSN WS1 & WS2 & WSN -->|heartbeat| PT PT -->|status change| PS PS -->|fan-out| WS1 & WS2 & WSN

Scaling Dimensions

ComponentScaling strategy
WebSocket serversHorizontal — add more servers behind sticky load balancer; each holds N connections in memory
Presence store (Redis)Redis Cluster with hash slots; shard by user_id; each shard handles ~1M users
Pub/SubShard pub/sub channels across Redis instances by user_id hash; avoids single-node bottleneck
Heartbeat ingestionWebSocket servers batch heartbeats locally (e.g., every 1s flush) before writing to Redis — reduces Redis ops

Batching Heartbeats

At 500M users × 0.2 heartbeats/s = 100M Redis writes/s. A single Redis cluster cannot absorb this.

Local batching: each WebSocket server collects heartbeats from its local connections and flushes them in a single pipeline command every 1 second:

class HeartbeatBatcher:
    def __init__(self, redis, flush_interval=1.0):
        self.redis = redis
        self.flush_interval = flush_interval
        self.pending = set()

    def record(self, user_id: str):
        self.pending.add(user_id)

    async def flush(self):
        """Called every flush_interval seconds."""
        if not self.pending:
            return
        pipe = self.redis.pipeline()
        now = int(time.time())
        for user_id in self.pending:
            key = f"presence:{user_id}"
            pipe.set(key, now)
            pipe.expire(key, 15)
        await pipe.execute()
        self.pending.clear()

If a WebSocket server holds 100K connections, this reduces 20K individual Redis commands/s (100K × 0.2) down to a single pipeline of 100K commands every 1s — orders of magnitude fewer round-trips to Redis.

Consistency Trade-offs

Presence is one of the rare cases where eventual consistency is genuinely acceptable. A user appearing “online” for 15 extra seconds after closing the app is a negligible UX issue. A user appearing “offline” for 5 seconds after opening the app is similarly harmless.

Consistency levelBehaviorCost
StrongAll observers see the exact same status at the same instantRequires distributed consensus per status change — unacceptable latency and throughput
Eventual (with bounded staleness)Observers converge within heartbeat interval + propagation delay (~5–15s)Pub/sub + Redis TTL — scalable, simple
Best-effortObservers may miss transient status changes entirelyFire-and-forget pub/sub — cheapest, acceptable for “last seen”

Production systems (WhatsApp, Telegram, Discord) use eventual consistency with bounded staleness. Users tolerate a few seconds of stale presence. The system optimizes for throughput and simplicity over perfect accuracy.

⚠️

Do not use strong consistency for presence. Requiring linearizable reads for “is user X online?” would force every status check through a consensus protocol — destroying throughput for a feature where staleness is perfectly acceptable. This is a textbook case of matching the consistency model to the business requirement.

Multi-Device Presence

Modern users are logged in on multiple devices simultaneously — phone, tablet, laptop. The presence system must handle this correctly.

Rule: A user is online if any of their devices is online. A user is offline only when all devices are offline.

async def heartbeat_multi_device(redis, user_id: str, device_id: str):
    """Track presence per device. User is online if any device is active."""
    key = f"presence:{user_id}:devices"
    await redis.hset(key, device_id, int(time.time()))
    await redis.expire(key, 30)  # overall key TTL as safety net

async def is_online_multi_device(redis, user_id: str, timeout=15) -> bool:
    """User is online if any device heartbeated within timeout."""
    devices = await redis.hgetall(f"presence:{user_id}:devices")
    now = int(time.time())
    for device_id, last_seen in devices.items():
        if now - int(last_seen) < timeout:
            return True
    return False

async def cleanup_stale_devices(redis, user_id: str, timeout=15):
    """Remove devices that haven't heartbeated recently."""
    devices = await redis.hgetall(f"presence:{user_id}:devices")
    now = int(time.time())
    stale = [d for d, ts in devices.items() if now - int(ts) >= timeout]
    if stale:
        await redis.hdel(f"presence:{user_id}:devices", *stale)

The per-device hash also enables richer status like “online on mobile” vs “online on desktop” — useful for apps like Slack that show device-specific indicators.

Comparison of Approaches

ApproachDetection speedScalabilityComplexityBest for
WebSocket disconnect eventUnreliable (seconds to minutes)High (no extra traffic)LowSupplement to heartbeats, not a replacement
Client heartbeat + Redis TTLConfigurable (5–30s)High with batching + shardingModerateReal-time presence (chat apps, collaboration tools)
Periodic poll (“pull” model)Depends on poll interval (30–60s)High (amortized)Low“Last seen” display, non-real-time presence
Pub/Sub fan-outNear-instant propagationModerate (fan-out cost)HighLive status updates to active viewers
ℹ️

Interview tip: When designing presence in a system design interview, say: “Each client heartbeats every 5 seconds. I store the heartbeat as a Redis key with a 15-second TTL — when the key expires, the user is offline. For notifying contacts, I use pub/sub scoped to active viewers: when user B opens a chat screen showing A, B subscribes to A’s presence channel. This limits fan-out to only the users who care right now. For flapping prevention, I debounce offline transitions with a 30-second delay before broadcasting.” This covers detection, storage, fan-out optimization, and edge case handling — the four things interviewers evaluate on this topic.