Home › System Design › System Design Problems

medium Designing Notification Service

Let's design a notification system that works seamlessly for various applications, whether it's a social media platform, an e-commerce site, or a productivity tool.

A notification system is a centralized platform that delivers alerts and messages (notifications) to users on behalf of multiple separate applications (tenants). It acts as a shared service that any application – be it a social network, an e-commerce site, or a productivity tool – can use to send notifications to its users. For example, think of a system like Amazon Simple Notification Service (SNS) or Firebase Cloud Messaging, which many different apps leverage to inform users about events (like a new message, an order shipped, or a calendar reminder).

Real-world analogy: Consider how Facebook notifies you of a friend’s post, Amazon notifies you about your package delivery, and Slack alerts you about a new mention. A multi-tenant notification service is a single system capable of handling all those scenarios for many companies at once, ensuring each tenant’s data and users stay isolated from each other.

Key Entities/Terms:

Tenant: An independent application or client using the notification service. Each tenant has its own users and notification events (e.g., one tenant is a social media platform, another is an e-commerce site).
Notification: A message or alert about an event, intended for a user or set of users. For instance, “Alice liked your photo” or “Your order #1234 has shipped.”
User: The end recipient of notifications (each user is associated with a specific tenant/application).
Channel: The medium for delivering notifications. Common channels include in-app (within the application’s UI), push (mobile push notifications), email, and SMS. This system should support all major channels so tenants can reach users via their preferred method.
In-App Notification: Notifications shown inside an application (e.g., a badge or feed within the app). These should appear in real-time while the user is active.
Push Notification: Notifications delivered by mobile OS services (like Apple Push Notification service or Firebase Cloud Messaging) to a user’s device, even if the app is not open.
User Preferences: Settings that allow users to control notifications – for example, opting out of certain types, setting “quiet hours” (no notifications at night), or marking some alerts as high priority.
Multi-Tenancy: Architecture where one system serves multiple client applications in isolation. In our design, it means data and traffic for each tenant are logically separated (e.g., tenant A’s notifications and users are distinct from tenant B’s), even though they share the underlying infrastructure.

Before diving into design, let’s clarify what this notification service must do and the constraints it must meet.

Functional Requirements (Features):

Multi-Channel Delivery: Support delivering notifications via email, SMS, mobile push, and in-app channels. Every notification event can be routed to one or multiple channels as appropriate.
Real-Time & Batched Notifications: Handle both immediate event-driven notifications and bulk or scheduled notifications. Real-time notifications (e.g. a chat message or alert) should be delivered within ~1–2 seconds to the user. The system also supports batching use cases – for instance, an app might schedule a daily digest or a marketing campaign to millions of users at once. This requires efficient bulk dispatch (possibly by scheduling or trigger-based batch jobs) in addition to one-off, on-demand sends.
Message Scheduling: Allow notifications to be scheduled for future delivery (e.g., a reminder set to go out at a specific time). Instead of immediate dispatch, such requests are stored and sent at the correct time via a scheduler component. For example, a meeting reminder can be queued to send 10 minutes before the event.
Retry on Failure: Implement retry mechanisms for failed deliveries. If a notification fails to send on a channel (due to transient errors such as provider downtime or network issues), the system should retry sending after a short delay, possibly with exponential backoff. This ensures at-least-once delivery – the system will make multiple attempts so that temporary issues don’t cause message loss.
User Preferences & Opt-Out: Respect user notification preferences. The system should check if the user has unsubscribed or opted out of certain notification types or channels. For example, if a user disabled SMS notifications or opted out of promotional emails, the system must honor that and skip those channels. It should also support user-specific settings like preferred channels for certain alert types (e.g. critical alerts via SMS, newsletters via email).
Multi-Tenant Support: The service must isolate tenants’ data. Each tenant can only access and send notifications to its own users.

Notification Content & Templates: Support customizable content for notifications. Tenants (or their product teams) should be able to define message templates for each notification type, possibly with placeholders (e.g., “Hello {name}, your order {order_id} shipped.”). The service will fill in event-specific data.

No Channel Prioritization: All supported channels are treated equally by the system. If an event triggers notifications on multiple channels, each is delivered independently without prioritizing one over the other.
In-App Notification Management: Support in-app messaging – notifications that appear within the application’s UI (e.g., the notification bell 🔔 in a social app). This implies the system will store notifications in a database so that a user can fetch their notification list. It can also push in-app alerts in real-time to active user sessions (via WebSocket or similar).
API for Notification Producers: Provide clean APIs or endpoints that various applications (social media, e-commerce, productivity tools, etc.) can call to create notification requests. Each request includes details like the user(s) to notify, the content (possibly a template ID and data), and which channel(s) to use. The API should handle single-recipient notifications as well as bulk sends (e.g., specifying a list of users or a user segment for broadcast).

Non-Functional Requirements (Scale, Performance, Constraints):

Scalability: The system should handle web-scale load. This means potentially millions of users and high notification volumes:
- For a social media tenant, there could be hundreds of millions of notifications per day.
- The design should scale horizontally (add more machines to handle more load) without any bottlenecks.
High Throughput: Support very high write and read rates. For example, the system might ingest tens of thousands of notification events per second during peak (imagine a viral post generating notifications or a major sale event).
Low Latency: Especially for real-time channels, end-to-end latency (from event to user device) should be low (ideally under a second). Email/SMS can tolerate slight delays (e.g., < 1 minute).
High Availability: The service should be reliable – aiming for minimal downtime (e.g., 99.99% uptime). Notifications can be critical (password resets, security alerts, order confirmations), so the system must be redundant and fault-tolerant across data centers or AZs (Availability Zones).
Durability: No lost notifications. Once an event is accepted, it should be queued/stored so that even if servers crash, the notification will eventually be delivered.
Ordering: Within a single user’s notification feed, the system should maintain chronological order (at least eventual consistency in order). If two events for the same user occur, they should not appear out-of-order in the in-app feed. (Exact global ordering across the whole system is not required.)
Multi-Tenant Isolation & Security: One tenant’s heavy load or failure should not cascade to others. We may need to enforce per-tenant rate limits or partitioning. Data must be partitioned or tagged by tenant so that it’s impossible for tenant A to access tenant B’s notifications or user info. Also, ensure secure authentication for tenants calling the API.
Flexibility & Extensibility: It should be easy to add new channels (say, WhatsApp or push for a new platform) without redesigning the whole system. Similarly, adding new features like notification scheduling (send at a specific time) or batching (daily digests) in the future should be possible.

With these requirements, we have a clear target: a fast, reliable notification service that can handle all tenants’ events and deliver via the appropriate channels, respecting user preferences, even at a massive scale.

Let’s estimate the expected scale to ensure our design can handle the load. (These are rough numbers to guide our choices.)

Number of Tenants & Users: Suppose we serve 10–50 tenant applications. Some might be large (tens of millions of users) and others smaller. In total, imagine 100 million registered users across tenants, with 20–30 million daily active users (DAUs) generating or receiving notifications.
Notification Volume: Assume on average each active user triggers or receives ~5 notifications per day (varies by app: social apps might be higher, e-commerce lower). That’s on the order of 100–150 million notifications per day system-wide. In peak scenarios (like a viral event or big sale), this could spike higher.
- Peak Throughput: 150 million/day is ~1.7k notifications per second on average. Peaks could be 10x higher. We should design for 20–50k notification events per second at peak. For example, if one tenant is a social network and a celebrity with 10 million followers posts something at noon, that single event could fan-out 10 million notifications, over a few minutes. That requires huge instantaneous throughput.
Read vs Write:
- Writes: Every notification event is a write (to databases/queues). If we have ~150M/day, that’s ~1.7k writes/sec on average, with bursts as described (tens of thousands/sec).
- Reads: Users reading notifications (in-app) also generates load. If 20M users check their in-app notifications roughly once a day, that could be 20M read operations per day (~230 reads/sec average, but likely spiky around morning/evening). Many reads will fetch a small list (the user’s recent notifications).
- In real-time, many users will get notifications pushed without needing to pull, but when they open the app or if they load an inbox, it triggers reads. Write volume (sending notifications) will likely exceed read volume if push delivery is common.
Storage:
- Notification Storage: If we store notifications for in-app history, assume we keep the last 100 notifications per user. With 100M users, that’s up to 10 billion notification records in storage. However, not all users are active; focusing on 20M actives with, say, 50 each, it’s around 1 billion records. If each stored notification entry is ~500 bytes (including message text, metadata), 1 billion records is ~500 GB. Add overhead and replication, a few TB of storage needed. This is large but manageable with distributed databases.
- Preferences Storage: One record per user for preferences (which channels enabled, quiet hours etc.). 100M users * a few hundred bytes each = tens of GB. This can fit in a SQL DB partitioned or a NoSQL easily.
- Other Data: Possibly device tokens for push, email addresses if stored here (though user data might live in tenant’s DB – we might just receive those when sending).
- Bandwidth: Pushing text-based notifications is relatively lightweight. For example, 100M notifications * ~1KB each on average = ~100 GB of data transferred per day just in notification payloads. We need network capacity for this, but it’s feasible across a distributed system.
Latency Expectation:
- In-app/push: ideally <1–2 seconds from event to device notification. So our pipeline (ingestion -> processing -> push) should introduce minimal delay (tens or low hundreds of milliseconds at each step).
- Email/SMS: Sending of an email within ~1 minute is fine. External email gateways might introduce some seconds of delay, but generally sending out an email should be a sub-second operation on our side; the email might arrive in inbox seconds or tens of seconds later.
- We’ll have to buffer or queue events, but the queue should not add too much delay except during enormous spikes.
Throughput per Channel:
- Push notifications: If a major event happens, we might send tens of thousands per second to APNs/FCM. We need to maintain connections and possibly throttle per provider guidelines.
- Emails: If using an email service or server, sending thousands per second is possible but might need multiple servers or providers for large scale. (150M/day emails would be extremely high; realistically, not all notifications go via email – many are in-app or push. Email might be a smaller fraction of total notifications.)
- SMS: Likely lower volume due to cost and use-case (used only for urgent things like 2FA or order updates). Could be tens of thousands a day at most, which is trivial in comparison – but SMS has costs and external gateway limits that we’d account for.

These estimations highlight the need for aggressive horizontal scaling (lots of parallel processing), efficient data partitioning, and asynchronous processing. The system must be distributed to handle peak loads and large storage.

Now let’s design the architecture with these numbers in mind.

We expose a REST API.

1. Send Notification

POST /v1/notifications
Request Body:

{
  "user_id": "u_12345",
  "template_id": "order_shipped_v1",
  "priority": "high", // or "low"
  "idempotency_key": "uuid-gen-by-client", 
  "data": {
    "order_id": "999",
    "name": "Alice"
  }
}

Response: 202 Accepted

{ "notification_id": "notif_abc123" }

Error: 429 Too Many Requests (Quota exceeded).

2. Get Status

GET /v1/notifications/{notification_id}
Response: { "status": "delivered", "channel": "sms", "updated_at": "..." }

3. Update Preferences

PUT /v1/users/{user_id}/preferences
Body: { "categories": { "marketing": false, "shipping": true } }

4. High-Level Design

At a high level, the notification system will be designed as a distributed, event-driven pipeline with multiple components cooperating asynchronously. The core idea is to decouple the act of generating a notification from the act of delivering it using a reliable queue, which allows the system to scale and to buffer bursts. Different components will handle: intake of requests, scheduling, queueing, processing per channel, and persistence. Each component can scale horizontally and be managed independently. Here's an outline of the architecture:

Producer (Client Application): This is not part of our system per se, but it's any external app or service (the tenants) that needs to send notifications. They will interface with our system through an API or message interface.
API Gateway / Load Balancer: Fronts the service, receiving API calls from various applications that want to send notifications. It handles authentication, rate limiting, and load balancing across multiple instances of the notification service.
Notification Service (Dispatcher): The core service that accepts notification requests, processes them (validates data, checks preferences, formats messages), and enqueues the notification for delivery. This acts as the brain of the system coordinating between upstream requests and downstream delivery channels.
User Preference Service/DB: A service or database that stores user notification settings – which channels they prefer for which types of notifications, do-not-disturb windows, subscription statuses, etc. The Notification Service queries this to filter or modify outgoing notifications according to user wishes.
Notification Queue System: A durable, high-throughput messaging/queue system (e.g. Apache Kafka, RabbitMQ, AWS SQS) that buffers notifications and decouples the ingestion of notification requests from the actual delivery processing. The Notification Service produces messages to the queue; downstream Channel Workers consume from it. This decoupling allows asynchronous processing and smooths out spikes in load, providing backpressure if needed.
Scheduler (for Delayed/Batched Jobs): A component responsible for scheduling notifications that aren’t meant to be sent immediately. This could be a separate Scheduler Service that stores future notifications (e.g. in a database or a priority queue) with their target send time. When the time arrives, it pushes those notifications into the main Queue for delivery. This enables features like "send an alert at 9 AM" or batch sends for campaigns.
Channel Workers: These are worker services specialized for each delivery channel: e.g., Email Sender, SMS Sender, Push Notification Sender, In-App Notifier. They pull messages from the Notification Queue and actually send the notification via the appropriate third-party service or protocol. Each channel worker knows how to format and communicate with the channel’s delivery mechanisms (for instance, the Email worker integrates with an SMTP server or email API service like SendGrid; the Push worker calls Apple or Google push services, etc.).
Notification Database(s): Storage for persistent data: user preferences, templates, and notification logs or in-app messages. This might be split into multiple specialized stores: e.g., a relational database for logs and delivered notification records, a NoSQL store for user preferences (to handle high lookup volume), and maybe a blob storage for any large content/attachments that are part of notifications. In-app notifications likely reside in a database table keyed by user so they can be retrieved when the user opens the app.
Template Service/Repository: (Optional) A component where notification templates are defined and stored. Rather than each request providing full text for every channel, clients could reference a template ID and data variables. The Notification Service would fetch the template and generate the actual message content per channel. This ensures consistency and easier updates to notification formats. Templates might be stored in a DB or even managed by the Notification Service itself if simple.
Monitoring & Analytics Services: (Supplementary) Components that gather metrics, logs, and health information from the system for monitoring, alerting, and analysis. Not part of the main data flow but critical for operations.

Data Flow (Notification Request to Delivery)

Let's walk through how a notification travels through the system in real-time mode, and in batch mode:

Request Ingestion: An external application (producer) makes an API call to send a notification. For example, an e-commerce platform calls POST /sendNotification with the user’s ID, notification type (say, "OrderShipped"), and perhaps some content or template data. This request hits the API Gateway, which authenticates it (ensuring the caller is allowed to send notifications) and forwards it to one of the Notification Service instances.
Request Handling in Notification Service: The Notification Service receives the request. It validates the payload – confirming required information is present (such as a recipient identifier, message content or template, and at least one channel) and that it’s well-formed. It might also perform authentication/authorization checks if not done earlier. After validation, it quickly acknowledges receipt (especially if this is a synchronous API call) so the upstream caller can proceed, while the notification is handled asynchronously thereafter.
Preference & Routing Logic: The Notification Service then determines how to route this notification. It queries the User Preference Service (or reads from a cached preferences store) to get the user’s current notification settings. This yields information such as: which channels the user has enabled, any channel overrides for this notification type, and whether the user has hit any rate limits or snooze settings. For example, if the user opted out of email for promos, and this is a promotional notification, the service would exclude email from the channels. It also checks system-wide rules (e.g., not sending more than N marketing notifications to a user per day) to decide if this notification should be throttled or dropped.
Message Preparation: Now, the service knows which channels to send the notification through (based on the request and preferences). It prepares the content for each channel. If templates are used, the service fetches the template for, say, email and fills in the dynamic fields (like user name, order details). Each channel may require a slightly different payload – e.g., an email needs a subject and HTML body, whereas an SMS needs a short text, and a push notification might have a title, body, and custom data for the app. The Notification Service can transform the request into channel-specific messages.
Enqueueing Messages: The service then enqueues the notification for delivery. Since there’s no priority among channels, it will enqueue a separate message for each channel the notification should go out on. This can be done in two ways: (a) using a unified queue/topic with the channel as part of the message data, or (b) using separate queues or topics for each channel (e.g., an "Email" queue, "SMS" queue, etc.). A common design is to use topic partitioning – for instance, a Kafka topic per channel – so that channel workers can consume their respective messages independently. For example, if a notification needs to go via Email, SMS, and Push, three messages are produced: one to the Email topic, one to the SMS topic, one to the Push topic. Each message includes the content and metadata needed for that channel (like the email subject or the SMS text, destination addresses, etc., along with a notification ID and perhaps a retry count). By pushing messages to a queue, we decouple the immediate request handling from the slower delivery process – the API call can complete quickly, and actual sending happens asynchronously.
(Scheduled Notifications): If the request included a future send time or it’s part of a batch campaign, the Notification Service would invoke the Scheduler instead of immediately enqueueing for delivery. For a scheduled notification, the service stores the request (for example, in a Scheduler Service database or a delayed-queue) with the desired send timestamp. The Scheduler continuously scans or waits until the send time is reached, then moves those pending notifications into the regular delivery queue (feeding into step 5 at the appropriate time). This way, scheduled notifications join the same pipeline as real-time ones when their time comes. Batched campaigns might be handled by inserting many notifications into the scheduler or queue in bulk (possibly with their own tools to generate a large list of user-specific messages, as was done in the Duolingo Super Bowl campaign using precomputed user lists in S3).
Queue Processing and Channel Delivery: The various Channel Processor services are subscribed to the Notification Queue (or specific topics). Each channel processor continuously pulls messages intended for its channel. For example, an Email Worker instance will read from the Email topic/queue. Once a message is pulled, the worker processes it: it establishes a connection to the relevant delivery provider or service and attempts to send the notification. This involves using channel-specific protocols or third-party APIs:
- The Email Processor might call an email sending service or SMTP server (such as SendGrid, Mailgun, or Amazon SES) to send out the email. It will format the email (HTML or text) if not already formatted, handle attachments or images, and send the message to the recipient’s email address.
- The SMS Processor will send the SMS text via an SMS gateway API (like Twilio or Nexmo). It may need to ensure the text fits within SMS length limits or split long messages, handle country codes, etc.
- The Push Notification Processor uses push services such as Firebase Cloud Messaging (FCM) for Android and Apple Push Notification service (APNs) for iOS. It prepares the payload (title, body, icon, any custom data) and sends it to the respective service, which will then deliver it to the app on the user’s device.
- The In-App Notification Processor will write the notification to the in-app notifications store (database) and if the user is currently online/connected, it can also deliver it in real-time via a persistent connection (for example, via a WebSocket message to the user’s session). This ensures the notification appears inside the app immediately if the user is active, and is stored for later retrieval if the user is offline.
  Each of these processors operates independently, so an email notification could be sending at the same time as the SMS and push for the same event – there’s no requirement to do them in sequence. This parallelism improves overall latency for multi-channel notifications.
Delivery Confirmation & Logging: After attempting to send, each channel processor gets a result – success or failure. For channels with synchronous APIs, the result is immediate; for some (like email), there might be callbacks or events for delivery/bounce later, but at least initial send status is known. The processor then logs the outcome of the notification into a Notification Logs store. A log entry typically includes: notification ID, user ID, channel, timestamp, status (sent, delivered, failed, etc.), and perhaps an error code or provider response if failed. These logs enable tracking and auditing – e.g. for customer support queries “I didn’t get this email” or for system health metrics. If the send was successful, the log is marked delivered; if it failed, the system may mark it for retry (and log an initial failure with maybe a pending status). All log entries are stored in a database built to handle high write volume. This could be a relational DB table or a time-series store or even a log indexing system, depending on query needs.
User Notification Retrieval: For in-app notifications, the final step is allowing the user to read their notifications. The app’s client (e.g., mobile app or web frontend) can call an API (possibly the Notification Service or a dedicated Notification Query Service) to fetch recent notifications for that user. The service will query the Notifications DB for that user’s records (potentially using a cache for speed) and return them so the client can display, for example, a list of notifications (read/unread status, etc.). This read path is separate from the write path of sending notifications to maintain high throughput – often even separated into a different service or read replica to not impact send performance. The query service can also support features like marking notifications as read, or searching within notifications, but those are extensions beyond core delivery.

Throughout this flow, no single channel is prioritized or blocks another – each channel’s delivery is handled independently via the queue. This design ensures the fastest possible dispatch on all channels and isolates any slowness (for instance, an email service lag) to that channel’s worker without holding up others.

6. Data Storage and Schema

In this hybrid notification system, SQL tables store structured, relational data (like user info, preferences, and templates) for consistency and ease of management, while NoSQL tables handle high-volume, unstructured event data (notification history and statuses) for scalability. This approach leverages the strengths of both: relational databases ensure ACID compliance for critical metadata, and NoSQL databases provide flexible schemas and horizontal scaling to handle millions of notification records efficiently. Below is the schema design with each table’s fields, data types, and descriptions in tabular format, along with explanations of design considerations before and after the tables for clarity.

SQL Tables

The SQL tables manage static or slowly changing data such as tenant configurations, user profiles, notification preferences, templates, scheduled jobs, and audit logs. These tables are normalized and use clear relationships (foreign keys) to maintain data integrity. Each table is outlined with its fields:

tenants

Stores tenant (application) details. Each tenant represents a client application or organization using the notification system.

Field Name	Data Type	Description
tenant_id	INT (PK)	Unique identifier for the tenant (primary key).
name	VARCHAR(255)	Name of the tenant application or organization.
contact_email	VARCHAR(255)	Contact email for the tenant’s administrator or support.
created_at	DATETIME	Timestamp when the tenant was registered.
status	VARCHAR(50)	Status of the tenant (e.g., “active”, “inactive”), used to enable/disable notifications for this tenant.

Explanation: The tenants table ensures multi-tenancy support by segregating data per application. It includes basic identification and contact info for each tenant, along with a status flag to control activity. This structured information is well-suited for a relational table since it’s relatively static and requires consistency.

users

Stores users who receive notifications. Users are associated with tenants.

Field Name	Data Type	Description
user_id	INT (PK)	Unique identifier for the user (primary key).
tenant_id	INT (FK)	Identifier of the tenant this user belongs to (foreign key to `tenants`).
name	VARCHAR(100)	Full name of the user.
email	VARCHAR(255)	Email address of the user (used for email notifications).
phone	VARCHAR(20)	Phone number of the user (used for SMS or phone notifications).
created_at	DATETIME	Timestamp when the user account was created.
status	VARCHAR(50)	Account status of the user (e.g., “active”, “disabled”).

Explanation: The users table holds profile and contact information for notification recipients. It relates to the tenants table via tenant_id to ensure each user is tied to the correct tenant (application). Storing users in SQL guarantees referential integrity (e.g., notifications link to valid users) and makes it easy to join with preferences or templates if needed.

user_preferences

Stores per-user notification preferences, such as which types of notifications or channels a user has enabled or disabled.

Field Name	Data Type	Description
user_id	INT (PK, FK)	Identifier of the user (primary key, foreign key to `users`). Each user has one preferences record.
email_notifications	BOOLEAN	Whether the user wants to receive email notifications (true = opt-in).
sms_notifications	BOOLEAN	Whether the user wants to receive SMS/text notifications.
push_notifications	BOOLEAN	Whether the user wants to receive push/in-app notifications.
language_preference	VARCHAR(10)	(Optional) Preferred language or locale for notifications (e.g., "en", "es").
updated_at	DATETIME	Timestamp of the last update to this user’s preferences.

Explanation: The user_preferences table captures notification settings for each user. By separating preferences from the main user profile, the system can easily extend or modify preference fields without altering core user data. Each user has one row containing various preference flags (for different channels or categories of notifications). In a relational model, this structured approach simplifies queries for preference-controlled sends (for example, checking if a user has opted into email before sending).

notification_templates

Stores reusable message templates for notifications. Templates allow consistent formatting of messages and reuse across multiple notifications.

Field Name	Data Type	Description
template_id	INT (PK)	Unique identifier for the notification template.
tenant_id	INT (FK)	Tenant that owns this template (foreign key to `tenants`), allowing tenants to have custom templates.
name	VARCHAR(100)	Template name or key (e.g., “WelcomeEmail”, “PasswordReset”).
subject	VARCHAR(255)	Subject line for the notification (if applicable, e.g., for email notifications).
body	TEXT	Body content of the template, with placeholders for dynamic data (e.g., “Hello {username}, ...”).
channel	VARCHAR(50)	Channel/type of notification this template is for (e.g., “email”, “sms”, “push”).
created_at	DATETIME	Timestamp when the template was created.
updated_at	DATETIME	Timestamp when the template was last modified.

Explanation: The notification_templates table enables definining message formats once and reusing them. Each template can be specific to a channel and possibly personalized with placeholders. Storing templates in SQL ensures they can be transactionally updated (e.g., updating the content or disabling a template) with versioning if needed. Since templates are relatively static and shared data, an SQL table is appropriate for strong consistency (so that notifications always use the latest approved content for compliance).

scheduled_notifications

Stores notifications that are scheduled for future delivery (e.g., reminders or announcements that should be sent at a later time or on a specific schedule).

Field Name	Data Type	Description
schedule_id	INT (PK)	Unique identifier for the scheduled notification entry.
tenant_id	INT (FK)	Tenant context for the notification (foreign key to `tenants`), helpful for multi-tenant filtering.
user_id	INT (FK)	The intended recipient user’s ID (foreign key to `users`).
template_id	INT (FK)	Reference to the notification template to use (foreign key to `notification_templates`).
scheduled_time	DATETIME	Date and time when the notification is scheduled to be sent.
status	VARCHAR(50)	Current status of the scheduled notification (e.g., “scheduled”, “sent”, “canceled”).
created_at	DATETIME	Timestamp when this schedule was created.
created_by	VARCHAR(50)	Identifier of who scheduled the notification (could be a user ID or system/admin user).

Explanation: The scheduled_notifications table holds messages that need to be sent in the future. Each entry ties a user with a template to send at a specific time. Using SQL here allows for reliable scheduling (with transactions to avoid duplicate scheduling or race conditions) and easy querying (e.g., find all notifications to send in the next hour). The status field tracks whether the job is still pending, already sent, or cancelled, and can be updated as the job is processed. When the scheduled time arrives, the system will retrieve these entries and then create actual notification events (which get recorded in the NoSQL notifications history).

audit_logs

Stores logs for compliance and tracking important system events. This includes records of notifications sent, preference changes, template edits, or any administrative actions, to provide an audit trail.

Field Name	Data Type	Description
log_id	BIGINT (PK)	Unique identifier for the audit log entry.
tenant_id	INT (FK)	Tenant associated with the event (foreign key to `tenants`), if applicable.
user_id	INT (FK)	User associated with the event (if applicable, e.g., the user who was notified or whose preferences changed).
action	VARCHAR(100)	Description of the action or event (e.g., “NOTIFICATION_SENT”, “PREFERENCE_UPDATED”, “TEMPLATE_MODIFIED”).
details	TEXT	Additional details about the event (e.g., notification content snippet, old vs new values for changes, error messages if any).
performed_by	VARCHAR(100)	Who performed the action (could be a user id, admin id, or system process name).
timestamp	DATETIME	Date and time when the event was logged.

Explanation: The audit_logs table provides a historical record of significant events for compliance auditing and debugging. By storing these in SQL, we ensure they are immutable and safely stored (with ACID guarantees) — important for compliance requirements. For example, whenever a notification is sent, an entry could be logged here (in addition to the NoSQL history) to satisfy regulatory needs that require easily queryable records of communication. It can also log user preference changes or template changes to track who did what and when. This structured log data can be filtered by tenant or user for audit reports.

NoSQL Tables

The NoSQL collections handle the high-scale data: the actual notifications sent to users and the tracking of their delivery/read status. Using a NoSQL database (like MongoDB or Cassandra) for this part of the system allows it to scale horizontally to millions of records and high write throughput with flexible schema. Each notification event is stored as a document, and any updates to its delivery status are recorded separately.

notifications

(NoSQL collection) Stores the notification history for users – each record represents a notification event (a message sent to a user).

Field Name	Data Type	Description
notification_id	UUID/String (PK)	Unique identifier for the notification event (could be a UUID or auto-generated ObjectID in a document store).
tenant_id	INT or String	Tenant identifier for context (duplicates the tenant for quick filtering in NoSQL, since joins aren’t used).
user_id	INT or String	ID of the user who received the notification.
template_id	INT or String	Reference to the template used (if any) for this notification.
message	TEXT/JSON	The content of the notification sent (could be text, JSON for structured content, etc.).
channel	STRING	Delivery channel used (e.g., “email”, “SMS”, “push”).
sent_at	DATETIME	Timestamp when the notification was sent (or created in the system).
metadata	JSON (optional)	Additional metadata for the notification (e.g., context info like subject line, recipient info, or template parameters used).

Explanation: The notifications collection holds each notification that has been generated and sent to users. Storing these in a NoSQL database enables handling a massive volume of events — for example, every single notification for every user can be logged without impacting the performance of the transactional system. Each document includes the content and context of the notification, which is useful for building a notification history or inbox feature for users. We include references like template_id to know which template was used (if needed for reconstructing or analyzing messages), and store the actual message content so that the history is preserved even if templates change over time. The channel helps identify how it was delivered (for example, to filter user history by channel). Storing tenant_id and user_id with each record (even though that duplicates relational data) is intentional in NoSQL design to optimize queries by those fields (common access patterns are fetching all notifications for a given user or tenant). This denormalization is acceptable given NoSQL’s focus on read performance and scalability. According to a scalable design approach, each notification event should capture all data needed without requiring joins, which is achieved by including message content and context directly in the document.

notification_status

(NoSQL collection) Tracks the delivery status of notifications, separate from the notification content. This is used to monitor whether notifications have been delivered, read, failed, etc., and to update their status over time.

Field Name	Data Type	Description
status_id	UUID/String (PK)	Unique identifier for this status record (could be a unique ID or use composite key of notification+timestamp).
notification_id	UUID/String	Identifier of the notification this status update corresponds to (foreign key reference to a record in `notifications` collection).
user_id	INT or String	ID of the user who is the recipient of the notification (for convenient querying by user, duplicates from `notifications`).
status	STRING	Delivery status value (e.g., “sent”, “delivered”, “opened”, “failed”, “read”).
updated_at	DATETIME	Timestamp when this status was recorded (when the status changed or was reported).
details	JSON (optional)	Additional details about the status, if any (e.g., error message if failed, device info for delivery, or read timestamp).

Explanation: The notification_status collection is designed to keep track of the evolving status of each notification. By separating status tracking into its own collection, the system can record multiple status changes over time without rewriting the main notification record. For example, a notification might be sent at time A, delivered at time B, and read by the user at time C – each of these can be a separate entry in notification_status. This separation improves write performance (as status updates are high-frequency and can be appended as new documents) and avoids contention on the main notification record. It also allows querying the status history independently (for analytics or troubleshooting). Each status entry references the notification_id and the user_id for ease of lookup. This design is similar to having a NotificationReceivers/Statuses collection in other systems, which links a notification to its recipient and status. By logging statuses (like read/unread or delivered/failed) separately, we can efficiently track who has seen or received each notification and scale to large numbers of events and users without heavy joins.

Dead Letter Queue (DLQ) storage: This is for notifications that failed to send after all retry attempts. We can have a separate persistent store or queue for these. A simple approach: a table FailedNotifications(notification_id, user_id, channel, error_code, last_attempt_time) that the ops team can review. Or integrate with a message queue’s DLQ feature (e.g., a Kafka topic or RabbitMQ DLQ) where messages go after max retries. These records might be periodically reprocessed (maybe after the underlying issue is resolved) or at least analyzed. They are also useful for debugging issues with providers.

7. Detailed Component Design

Notification Scheduler & Batch Processing

To support scheduled and batch notifications, a Scheduler component is included. This could be as simple as a database of future notifications with a background job, or a distributed scheduling system. One implementation: use a table ScheduledNotifications(id, user, content, channels, send_time) where notifications are stored if send_time is in the future. A Scheduler Service periodically scans for due notifications (or a lightweight cron job queries for items where send_time <= now()), then enqueues them to the Notification Queue for delivery. For precision (when a large batch needs to go out exactly at a specific time), the system might pre-fetch those notifications slightly before due time and stage them. In high-scale cases, a more sophisticated approach is needed: e.g., use a delayed queue or priority queue data structure. Some message queues (like RabbitMQ or AWS SQS with delay) support delayed messages natively. Another approach is to partition scheduled jobs by time (e.g. per minute) and have distributed workers pick up the correct bucket at the right time. The key is that scheduling logic runs in a fault-tolerant way: possibly multiple scheduler nodes for redundancy, using locks or leader election to avoid duplicate sends. If a scheduled notification is missed (e.g., scheduler down at that minute), the system should detect and recover or have a manual fallback.

For batch campaigns (e.g., sending a marketing email to 10 million users), it’s inefficient for a client to call the API 10 million times. Instead, we might provide a bulk interface or offline loading mechanism. For instance, an admin could upload a list of target user IDs for the campaign. The system (as described by the Duolingo case) might pre-fetch those users’ data, store the list in a file or DB, and then when triggered, rapidly push all those notifications through the pipeline. The workers and queue must be scaled up to handle this burst. To avoid overwhelming downstream providers (like an email service), we might intentionally throttle the consumption rate or use multiple provider accounts spread over the load.

Worker Architecture and Queuing Mechanisms

The Channel Processors (workers) are designed to be stateless and horizontally scalable. Each worker instance is a consumer from the queue. For a high-throughput system, a distributed log/queue like Kafka is a good choice because it can handle very high message rates and partition the stream across many consumers. We would create (for example) separate Kafka topics for Email, SMS, Push, etc., each with multiple partitions. The number of partitions defines the maximum parallelism – we can have one consumer thread per partition. If we anticipate 100k notifications/sec on peak, and a single consumer can process say 1k messages/sec, we’d need on the order of 100 consumers working in parallel across partitions. We might allocate, say, 50 partitions for email, 20 for SMS (if volume is lower), 30 for push, etc., based on expected load per channel. Each partition could be processed by one consumer instance, and we can always add more consumers up to the partition count. Workers can be auto-scaled based on queue backlog or system load. Each message includes a unique notification ID and perhaps a deduplication key (especially if using at-least-once delivery semantics, so if the same message reappears due to a retry, the worker can detect it has seen that ID before and skip or update instead of duplicate sending). Workers also maintain a retry count or delivery status in memory or via the log metadata.

Queue Semantics: The queue should guarantee at-least-once delivery of messages to the workers, meaning a message will be retried if a worker crashes mid-processing. Kafka by default provides at-least-once (consumers commit offsets after processing). We could also use a system like RabbitMQ which can requeue un-ACKed messages. In either case, if a worker fails after partially sending, we need idempotency to avoid duplicates. Alternatively, an exactly-once pipeline could be built with more complexity (e.g., Kafka transactions, idempotent producers/consumers). Many large systems choose at-least-once with de-duplication on the consumer side as needed. Deduplication could involve checking a cache or database of recently sent notification IDs.

No Prioritization: Since “no prioritization between channels” is required, we are not weighting the queue processing in favor of any channel. All channel topics are processed as fast as possible. Priorities can still be handled within a channel if needed (for example, maybe an SMS could have normal vs high priority messages in different queues), but the problem statement says no prioritization across channels, so we treat them equally.

Ordering Considerations: Generally, notifications to the same user via the same channel should be delivered in order of generation. Using a partitioning scheme where the partition key is the user ID (or some hash of it) can ensure that all notifications for a given user go to the same partition (and thus are processed in sequence). This prevents, say, an “Order Shipped” notification from overtaking the “Order Placed” notification for the same user. Cross-user ordering doesn’t matter. Partitioning by user also distributes load roughly evenly if user IDs are random. If certain users generate a lot of notifications (e.g. a very active user), that partition could see heavier load – in extreme cases we might partition by a composite key (user and maybe notification type) or just accept some imbalance for ordering guarantees. Another approach is partition by notification type or channel-specific logic (ensuring, for example, all promotional emails are spread out). The design can incorporate sharding at multiple levels: e.g., user-based sharding for preferences DB and partitioning by time for logs, but for the queue, user-based partitioning is a simple strategy to preserve order per recipient.

Retry and Failure Handling

Retry Policies: Each channel processor will implement a retry mechanism for transient failures. Common strategy is exponential backoff – if a send fails, wait a short interval and try again, increasing the wait time on each failure. For example, after 1st failure wait 1s, after 2nd wait 5s, then 30s, etc. We will configure a maximum number of retries (say 3 attempts) per notification per channel. Some channels might have specific retry rules: e.g., for SMS, if the first attempt fails due to a carrier issue, it might be pointless to retry quickly – maybe we retry after a longer delay or use an alternate SMS provider if available. For email, a common pattern is to retry a couple of times over a few minutes for temporary SMTP issues. Push notifications typically either succeed or not; if a device token is invalid, a retry won’t help (the failure is permanent for that token). So the worker might classify errors into permanent vs transient. Permanent errors (invalid address, unregistered device, etc.) will not be retried excessively; they might be logged and dropped immediately. Transient ones (server timeouts, rate limit exceeded, etc.) trigger the retry logic.

The system can implement retries in a few ways. One is in the worker process itself with an in-memory timer or loop. Another robust way at scale is to publish the failed message to a retry queue (or back onto the main queue with a delay). For instance, if using Kafka, we might have separate topics like Email_Retry or use a field in the message for attempt count and have workers requeue the message with an incremented attempt count and a timestamp to not process until a certain time. Some setups use a dead-letter queue pattern: after N failed attempts, the message goes to a DLQ instead of retrying further.

Deduplication: With at-least-once delivery and retries, deduplicating notifications is important so users don’t get spammed with duplicates. We handle this by using a unique notification ID for each notification request (the API should provide or the system generates one). This ID travels with all channel messages. Workers or the Notification Service can keep a cache of recently seen IDs or check in the database before sending. For example, the Notification Service could store a short-lived record of "notification request processed" keyed by that ID, so if the same request comes again (due to client retry or message duplication), it ignores it. On the consumer side, if a worker sees a message ID that was already processed (could check a Redis set or an entry in the logs), it will skip sending. Idempotent operations are also key: if we accidentally send the same email twice, many email providers have ways to detect duplicate IDs if we pass a message ID, but we shouldn’t rely solely on that. Within our system, a combination of careful commit handling in the queue and idempotency checks prevents duplicates from normal operation. The Duolingo example explicitly mentioned ensuring no duplicate messages even if two triggers fired, highlighting the need for idempotency in the face of concurrent events.

Failure Handling and Fallbacks: Not all failures are temporary. If a channel is down or a third-party service is having an outage, we may want to failover or fallback. For instance, if our primary SMS provider fails, a secondary provider could be used. The system can be configured with multiple integrations per channel, ranked by preference. On failure to deliver via the first, it can try the second. This improves reliability at the cost of complexity and possibly higher cost. Large systems often integrate multiple vendors for critical channels (e.g., two email service providers) and implement logic to route around outages. Similarly, if sending push notifications, if one region of FCM is unresponsive, we might try another region’s endpoint. These fallbacks should still obey the no-duplication rule (only one should ultimately succeed).

If a notification ultimately fails after retries (e.g., email bounced, or we exhausted retries for a transient error), the system should record that failure in the log and possibly move the notification to a Dead Letter Queue for manual review. Operators can inspect DLQ messages and decide to reprocess them later or alert the source service that the notification was not delivered. For example, if an email keeps bouncing, perhaps the user’s email is invalid – the system could flag that user’s email contact in the preferences as invalid to avoid future attempts.

Graceful Degradation: Under extreme load or partial failures, the system should degrade gracefully. For instance, if the notification database is down, the system might still attempt to send notifications but skip logging to the DB (or log to an in-memory buffer or fallback store to flush later). If the queue is backed up (indicating downstream slowness), the rate of accepting new requests might be throttled via the API gateway or using backpressure signals. The idea is to prevent total crashes – e.g., shed non-critical load if needed (maybe drop or delay lower priority notifications, though by requirement we aren’t prioritizing channels, an extension could be prioritizing notification types in an overload scenario).

Additional Features

Rate Limiting per User/Channel: To prevent a user from being overwhelmed by notifications (especially promotional), the system should enforce limits. For example, not more than X promotional notifications per day per user. This requires counting notifications sent (which could be done via the logs or a separate counter service) and checking before sending. If a limit is exceeded, the system can defer or drop some notifications (or combine them into a digest). This is typically part of the business logic in the Notification Service or User Preferences evaluation.

Notification Format and Personalization: The system design allows personalization by using templates and data. It also should handle localization (different languages for notifications, possibly choosing template based on user locale). Templates and content might be stored or managed through a CMS that the Notification Service can query, but at design time it suffices to note templates exist and are fetched when composing messages.

Security Considerations: Ensure that the Notification Service APIs are protected (e.g., with OAuth tokens or API keys) so that only authorized systems (like the company’s own backends) can send notifications – otherwise spammers could abuse it. Also, validate content to prevent injection (especially if content might be HTML for email, or if we allow some rich content in notifications). The system should also not leak data – e.g., one user shouldn’t be able to query another’s notifications due to proper auth checks.

8. Scalability and Performance Considerations

Designing for 3B notifications/day requires careful attention to scalability. Key strategies include distributing load, parallelizing work, and eliminating bottlenecks:

Horizontal Scaling & Load Balancing: All components are horizontally scalable. We will run multiple instances of the Notification Service behind a load balancer, so incoming API requests are distributed evenly. Similarly, we scale out channel worker instances – if more throughput is needed for email sending, we add more email worker processes or machines, each consuming from the queue partitions. The queue system (Kafka, for example) itself runs as a cluster on multiple brokers and can partition data across nodes, allowing it to handle more messages. By scaling horizontally, we can linearly increase capacity by adding more machines rather than needing any single machine to be extremely powerful. The system should be designed such that there is no single point where all traffic funnels through one instance (to avoid overload).
Efficient Queueing and Parallelism: Using a high-throughput distributed queue like Kafka ensures we can ingest and distribute messages quickly. Kafka can handle millions of messages per second in large clusters, so 100k/sec is feasible with a proper setup. We will tune partition counts to achieve the desired parallelism. Additionally, queue producers and consumers should use batching where possible – for example, the Notification Service might send messages to Kafka in batches to reduce overhead, and consumers might commit offsets in batches. The asynchronous design also means the front-end API isn’t waiting on the slowest part of delivery, improving perceived performance.
Caching & Fast Data Access: To keep latency low, we use caching for frequently accessed data like user preferences and templates. A cache like Redis can store user preference records in memory, avoiding a database lookup for each notification. This reduces latency in the critical path (preferences check) to microseconds. Similarly, if templates are stored in a DB or fetched from a service, we can cache them in memory within the Notification Service. Another caching aspect is for the read path: when a user fetches notifications, we could cache their last N notifications in memory (or use an in-memory store as a speed layer) to serve the data quickly, especially if they check frequently.
Database Partitioning and Sharding: The data stores will be split to spread load. For example, we might shard the notification logs by user ID or by region. User-based sharding could put users with certain ID ranges or certain geographic regions into separate databases. This prevents any single DB from becoming a write hotspot for all 3B daily notifications. Time-based partitioning of logs is also useful – e.g., have a separate partition/table per day or per month, so that writes for the current day go into a hot partition while old partitions are mostly static (and can even be moved to cheaper storage). This improves insert performance and makes purging old data easier (just drop old partitions). For user preferences, sharding might not be necessary if using a scalable NoSQL store, but if using SQL, we could partition that by user as well. The key is to divide and conquer the data such that no single node or small cluster handles the full 3B/day load.
Load Balancing at External Integrations: Ensure that calls to external providers (email/SMS gateways, push services) are also load balanced. For example, use multiple SMTP endpoints or multiple connections to the provider so that one slow connection doesn’t bottleneck the worker. If using cloud provider services (SES, FCM), they typically handle scaling on their side, but we must be mindful of their rate limits and possibly request increases in quota. For SMS, if using Twilio, we might use multiple phone numbers or messaging service pools to send in parallel at scale. If any single provider cannot handle our peak (e.g., an SMS vendor might throttle), we should integrate additional providers and split traffic, as mentioned earlier for reliability.

🤖 Don't fully get this? Learn it with Claude

Stuck on Designing Notification Service? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.

🪜 Hint ladder (no spoilers)

Progressively stronger hints — you still solve it.

I'm working on the problem **Designing Notification Service** (System Design). Give me a HINT LADDER: start with the tiniest nudge, then wait. Only reveal the next, stronger hint when I ask. Do NOT show the full solution unless I type 'show solution'. Keep me doing the thinking. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🎨 Explain the approach visually

See the technique, not just code.

Explain the optimal approach to **Designing Notification Service** with a VISUAL walkthrough: trace it on a small concrete example using ASCII art / a step-by-step diagram, narrate what changes each step, then give time & space complexity with a one-line derivation. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🔍 Review my solution

Catch bugs, edge cases, sub-optimality.

I'll paste my solution to **Designing Notification Service**. Review it for correctness, missed edge cases, and time/space complexity, then coach me toward the optimal — don't just rewrite it. Ask me to paste my code now. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🔁 Drill the pattern

Lock in recognition with look-alikes.

Give me 2 problems that use the SAME underlying pattern as **Designing Notification Service**. For each, let me attempt first, then review my answer and name the trigger signal that reveals the pattern. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

📝 My notes

← Designing Google News Designing Code Judging System →