Ad Click Event Aggregation
Ad Click Event Aggregation
Imagine you are tasked with designing a real-time system to aggregate and analyze ad click events for a large advertising platform. This system needs to handle massive data volumes, provide near real-time insights for reporting and billing, and be resilient to various failures.
Input Data:
Click events are appended to log files located on various ad servers. Each click event has the following attributes:
click_timestamp: When the click occurred (Unix epoch, milliseconds).user_id: Unique identifier for the user.ad_id: Unique identifier for the advertisement clicked.ip: IP address of the user.country: Country of the user.
Data Volume:
- Ingestion Rate: Approximately 1 billion ad clicks per day.
- Total Ads: Roughly 2 million unique ads in the system.
- Growth: The number of ad click events is projected to grow by 30% year-over-year.
Functional Requirements:
The system must support the following primary queries and aggregation capabilities:
- Ad Click Count (Per Ad):
- Return the total number of click events for a particular
ad_idwithin the lastMminutes. Mshould be a configurable parameter (e.g., 1, 5, 10, 60 minutes).
- Return the total number of click events for a particular
- Top N Most Clicked Ads:
- Return the top 100 most clicked
ad_ids in the pastMminutes. Mshould be a configurable parameter (e.g., 1, 5, 10, 60 minutes).- Aggregation for this query should occur and be updated every minute.
- Return the top 100 most clicked
- Filtered Aggregations:
- Both of the above queries (Ad Click Count and Top N Ads) must support filtering by
ip,user_id, orcountry. This means a user should be able to ask for, for example, “top 100 ads from users in the USA in the last 5 minutes.”
- Both of the above queries (Ad Click Count and Top N Ads) must support filtering by
Non-Functional Requirements:
- Latency:
- End-to-end latency (from click event generation to queryable aggregation) should be within a few minutes.
- Note: This latency is acceptable as the primary use cases are ad billing and reporting, not real-time bidding (RTB) which has much stricter sub-second latency requirements.
- Correctness:
- The accuracy of aggregation results is paramount, as the data is used for billing and critical reporting.
- Scalability:
- The system must be horizontally scalable to handle current data volumes and the projected 30% YoY growth, effectively operating at “Facebook or Google scale” for this specific use case.
- Reliability & Resilience: The system must account for and gracefully handle the following edge cases:
- Late Events: Click events might arrive later than their
click_timestampdue to network delays or system issues. - Duplicated Events: The same click event might be sent multiple times by upstream systems or agents.
- System Failures: Individual components or parts of the system might go down at any time, requiring robust recovery mechanisms without data loss.
- Late Events: Click events might arrive later than their