Ad Click Event Aggregation
Ad Click Event Aggregation
Imagine you are tasked with designing a real-time system to aggregate and analyze ad click events for a large advertising platform. This system needs to handle massive data volumes, provide near real-time insights for reporting and billing, and be resilient to various failures.
Input Data:
Click events are appended to log files located on various ad servers. Each click event has the following attributes:
click_timestamp
: When the click occurred (Unix epoch, milliseconds).user_id
: Unique identifier for the user.ad_id
: Unique identifier for the advertisement clicked.ip
: IP address of the user.country
: Country of the user.
Data Volume:
- Ingestion Rate: Approximately 1 billion ad clicks per day.
- Total Ads: Roughly 2 million unique ads in the system.
- Growth: The number of ad click events is projected to grow by 30% year-over-year.
Functional Requirements:
The system must support the following primary queries and aggregation capabilities:
- Ad Click Count (Per Ad):
- Return the total number of click events for a particular
ad_id
within the lastM
minutes. M
should be a configurable parameter (e.g., 1, 5, 10, 60 minutes).
- Return the total number of click events for a particular
- Top N Most Clicked Ads:
- Return the top 100 most clicked
ad_id
s in the pastM
minutes. M
should be a configurable parameter (e.g., 1, 5, 10, 60 minutes).- Aggregation for this query should occur and be updated every minute.
- Return the top 100 most clicked
- Filtered Aggregations:
- Both of the above queries (Ad Click Count and Top N Ads) must support filtering by
ip
,user_id
, orcountry
. This means a user should be able to ask for, for example, “top 100 ads from users in the USA in the last 5 minutes.”
- Both of the above queries (Ad Click Count and Top N Ads) must support filtering by
Non-Functional Requirements:
- Latency:
- End-to-end latency (from click event generation to queryable aggregation) should be within a few minutes.
- Note: This latency is acceptable as the primary use cases are ad billing and reporting, not real-time bidding (RTB) which has much stricter sub-second latency requirements.
- Correctness:
- The accuracy of aggregation results is paramount, as the data is used for billing and critical reporting.
- Scalability:
- The system must be horizontally scalable to handle current data volumes and the projected 30% YoY growth, effectively operating at “Facebook or Google scale” for this specific use case.
- Reliability & Resilience: The system must account for and gracefully handle the following edge cases:
- Late Events: Click events might arrive later than their
click_timestamp
due to network delays or system issues. - Duplicated Events: The same click event might be sent multiple times by upstream systems or agents.
- System Failures: Individual components or parts of the system might go down at any time, requiring robust recovery mechanisms without data loss.
- Late Events: Click events might arrive later than their