Knowledge Guide
HomeSystem DesignSystem Design Problems

medium Designing Gmail

Image
Image

Let’s design a distributed email service similar to Gmail that allows millions of users to send, receive, and organize messages asynchronously. In essence, this service acts as a massive digital post office: it accepts incoming mail, routes it to the correct recipient’s storage, and provides a user-friendly interface for reading, searching, and managing conversations. Key entities in the system include:

Gmail Key Entities
Gmail Key Entities

Step 1: Clarify and Define Requirements

Functional Requirements:

Non-Functional Requirements:

With these requirements clarified, we can proceed to capacity estimates to ensure the design meets the scale.

Step 2: Back-of-the-Envelope Capacity Estimation

Before choosing specific technologies, it's important to gauge the scale of the system:

These rough estimates highlight the immense scale of a Gmail-like service. The system must distribute load across thousands of servers. It also emphasizes certain design focuses: the storage system must handle exabytes; the throughput of email processing must handle millions per second; and read-heavy patterns mean caching and fast indexes are crucial. With capacity in mind, we can proceed to the high-level system design.

3. RESTful APIs

Designing a RESTful email service (similar to Gmail) involves creating clear, stateless endpoints that cover core email operations. Each API uses standard HTTP methods and JSON data formats for ease of use on web and mobile clients.

I. Send Email API (POST /emails/send)

This endpoint allows the client to compose and send an email.

Request Structure

Example Request:

POST /emails/send { "to": ["alice@example.com"], "cc": ["bob@example.com"], "subject": "Meeting Notes", "body": "Hi Alice,\nPlease find the notes attached.\nRegards,\nBob", "attachments": [ { "filename": "Notes.pdf", "contentType": "application/pdf", "content": "<base64-encoded file data>" } ] }

Response Structure

On success, the API returns a confirmation that the email was sent (or at least queued for sending). A typical response is 201 Created (since a new email resource is created in the Sent mailbox) along with the new email’s metadata (e.g. its email_id). For faster client feedback, the server might queue the email for sending and return 202 Accepted instead – the email will be sent asynchronously, which improves responsiveness for the mobile/web app. The response body could include the email_id and status:

Example Success Response:

HTTP/1.1 201 Created { "email_id": "12345", "status": "sent", "sent_at": "2025-02-07T23:08:00Z" }

If the request is missing required fields or is malformed, the server returns 400 Bad Request with an error message. For example, forgetting the to field or providing an invalid email address would yield a 400 error. The error response body might look like:

HTTP/1.1 400 Bad Request { "error": "Recipient list is required." }

II. Fetch Emails API (GET /emails/inbox)

This endpoint retrieves a list of emails in the user’s inbox. It returns a paginated collection, typically sorted by date (newest first by default). The response contains summary information for each email, allowing a client to show an inbox overview (sender, subject, snippet, read/unread status, etc.) without downloading full email bodies. By paginating results, the service avoids sending potentially thousands of emails in one response, which would be slow and resource-intensive. The client can request subsequent pages as the user scrolls or navigates, which is crucial for performance on mobile networks.

Request Structure

Example Request:

GET /emails/inbox?page=1&page_size=50 Authorization: Bearer <token>

This would fetch the first 50 emails in the inbox. The Authorization header (e.g., an OAuth2 token) is used to identify the user’s account.

Response Structure

A successful response returns 200 OK with a JSON body containing an array of email summaries and pagination info (such as a next page token or page number). For example:

HTTP/1.1 200 OK { "emails": [ { "email_id": "12345", "from": "charlie@example.com", "subject": "Project Update", "snippet": "Hi, here is the update on project X...", "date": "2025-02-07T20:15:00Z", "is_read": false, "has_attachments": true }, { "email_id": "12344", "from": "david@example.com", "subject": "Re: Meeting Notes", "snippet": "Thanks for the notes. I have a question about...", "date": "2025-02-07T18:02:00Z", "is_read": true, "has_attachments": false } // ... up to page_size emails ], "page": 1, "page_size": 50, "total_count": 1240, "total_pages": 25, "next_page": 2 }

Each email item is a summary that includes fields like email_id, sender (from), subject, a short snippet of the body, the sent/received date, read status, and a flag for attachments. By not including the full body and attachments in this list, the payload stays small and quick to transfer on mobile devices. If the client needs more details, it will call the Get Email Details API for a specific email.

III. Get Email Details API (GET /emails/{email_id})

This endpoint returns the full content of a specific email, identified by its email_id. A client (web or mobile) would call this when a user opens an email to read it. It returns all relevant details: headers (from, to, cc, date), subject, the body (which may have HTML and/or plain text), and attachment metadata. Attachments themselves can be included or provided via separate links. The goal is to provide everything needed to display or process the email message.

Request Structure

Example Request:

GET /emails/12345 Authorization: Bearer <token>

Response Structure

On success (200 OK), the response is a JSON object containing the email’s details. For example:

HTTP/1.1 200 OK { "email_id": "12345", "from": "charlie@example.com", "to": ["me@example.com"], "cc": ["alice@example.com"], "subject": "Project Update", "date": "2025-02-07T20:15:00Z", "body_text": "Hi,\nHere is the update on project X...\nRegards,\nCharlie", "body_html": "<p>Hi,<br>Here is the update on <b>project X</b>...<br>Regards,<br>Charlie</p>", "attachments": [ { "attachment_id": "att-98765", "filename": "ProjectX-Report.pdf", "contentType": "application/pdf", "size": 502400, "download_url": "https://api.example.com/emails/12345/attachments/att-98765" } ], "is_read": false, "folder": "INBOX" }

IV. Search Emails API (GET /emails/search)

This endpoint allows the client to search for emails matching certain criteria. Users can find emails by subject keywords, sender, recipients, or date ranges. The API accepts query parameters as filters and returns a list of emails (in the same summary format as the inbox list) that match.

Request Structure

The client can mix these parameters. For example:

GET /emails/search?from=alice@example.com&after=2025-01-01&q=report&page=1&page_size=20

This search would look for emails from Alice, after January 1, 2025, that contain the word “report”, returning the first 20 results. The server should handle the combination of filters gracefully (treating them as AND conditions). If no filters are provided, the API could default to returning an empty result or all emails, though typically at least one filter is used.

Response Structure

A successful search returns 200 OK with a JSON body similar to the inbox listing, but containing only emails matching the query. For example:

HTTP/1.1 200 OK { "emails": [ { "email_id": "12700", "from": "alice@example.com", "subject": "Project Report Q4", "snippet": "Attached is the Q4 project report you requested...", "date": "2025-01-15T09:24:00Z", "is_read": true, "has_attachments": true }, { "email_id": "12710", "from": "alice@example.com", "subject": "RE: Project Report Q4", "snippet": "Thanks for the report. I have some comments...", "date": "2025-01-16T11:00:00Z", "is_read": false, "has_attachments": false } // ... up to page_size results ], "page": 1, "page_size": 20, "next_page": 2, "total_matches": 35 }

Step 4: High-Level System Design

At a high level, the email service will use a distributed, microservices-based architecture. Different components handle different functions (sending mail, storing mail, searching, spam filtering, etc.), and they communicate over well-defined APIs. Below are the major components and an overview of how an email flows through the system:

Major Components:

High-level Design of Email Service
High-level Design of Email Service

Data Flow Overview:

Let’s walk through key scenarios: receiving an email, sending an email, and reading/searching an email.

Receiving Email (Inbound flow)
Receiving Email (Inbound flow)
Sending Email (Outbound flow)
Sending Email (Outbound flow)

Trade-offs in High-Level Design:

With the high-level design in mind, let’s dig deeper into data schema choices, data management, and detailed component design for each part of the system.

Step 5: Database Design

In this section, we explore how data is structured and managed within the system and design details of core components. We also discuss trade-offs in choosing certain technologies or approaches.

Email Storage Model

To handle billions of emails, we will separate our storage into a Metadata Store, a Document Store, and an Object Store:

We have two types of data to be stored: 1) Structured: This is small, structured info about users and emails (like sender, date sent, etc.), and 2) Unstructured: Large data consisting of email body and attachments.

We can use SQL or NoSQL for structured data. NoSQL stores offer flexibility and easier scalability whereas relational SQL stores are easier for complex queries and ensure consistency (ACID transactions for updates). However, at the scale of billions of messages, a single relational instance won’t suffice – we would need to partition it.

One reasonable choice is to go with a relational SQL database for structured metadata and a NoSQL store for unstructured content. The SQL part holds critical metadata (user accounts, email headers, indexes, relationships) with ACID compliance for consistency, while the NoSQL part holds email bodies and attachments to scale horizontally to billions of records. This hybrid approach leverages the strengths of each system – structured queries and strong consistency from SQL, and flexibility and scalability from NoSQL. Below, we will discuss the SQL schema and how it meets requirements for efficient retrieval, fast search, sharding, and multi-region distribution.

SQL Schema for Metadata (Relational)

The following SQL schema defines tables for users, emails, recipients, and attachments metadata. These tables use primary/foreign keys to enforce relationships (e.g. each email is linked to a user and recipients). We also define indexes on key fields like subject, sender, and date for fast lookup. All data types are chosen to balance storage and performance (e.g. using BIGINT for IDs, text types for subject and addresses, etc.). The schema ensures strong consistency for critical data – for example, inserting a new email’s metadata and linking it to a user and recipients can be done transactionally in SQL.

Users Table (SQL):

This table stores user account information.

FieldData TypeDescription
user_id (PK)BIGINTUnique user identifier (primary key)
usernameVARCHAR(100)Username or email address (unique)
created_atDATETIMEAccount creation timestamp
statusVARCHAR(20)Account status (e.g., active)

Emails Table (SQL)

Stores structured email header data (one row per email). Indexed columns are noted for fast search by subject, sender, or date.

FieldData TypeDescription
email_id (PK)BIGINTUnique email identifier (primary key)
user_id (FK)BIGINTOwner/recipient user’s ID (foreign key to Users)
senderVARCHAR(255)Sender email address (or user ID if internal)
subjectVARCHAR(255)Subject line of email (indexed)
date_sentDATETIMESent/received date and time (indexed)
has_attachmentsBOOLEANWhether email has attachments (for quick filtering)
folder_labelVARCHAR(50)Folder/label (e.g. "Inbox", "Sent")

Indexes: An index on subject, sender, and date_sent allows fast queries by these fields. For example, a B-tree index on subject and sender enables quick lookups without scanning the entire table. A composite index on (user_id, date_sent) could optimize retrieving all emails for a user sorted by date.

Email_Recipients Table (SQL)

Maps emails to their recipients (to handle multiple recipients per email). This table captures the relationship between emails and users (or external addresses).

FieldData TypeDescription
email_id (FK)BIGINTEmail ID (foreign key to Emails table)
recipient_addressVARCHAR(255)Recipient’s email address (internal or external)
recipient_user_idBIGINT (nullable)If recipient is a registered user, their user_id

(Primary key is a combination of email_id and recipient_address to ensure uniqueness per recipient.) This structure lets us query which emails were sent to a given address if needed. For internal users, recipient_user_id links to the Users table; otherwise, the email address is stored as text.

Attachments Table (SQL)

Stores metadata for email attachments (the files themselves live in the NoSQL/object store).

FieldData TypeDescription
attach_id (PK)BIGINTAttachment ID (primary key)
email_id (FK)BIGINTEmail ID that this attachment belongs to
file_nameVARCHAR(255)Original filename of the attachment
content_typeVARCHAR(100)MIME type (e.g. "image/png", "application/pdf")
file_sizeBIGINTSize of the attachment in bytes
storage_keyVARCHAR(255)Key/URL for the attachment in NoSQL storage

The storage_key is a pointer to where the actual attachment binary is stored in the NoSQL system (e.g. an object storage URI or document ID). This allows the relational DB to hold needed info for listing attachments without storing large blobs.

Note: All the above tables use foreign keys to maintain referential integrity (e.g., an email’s user_id must exist in Users). The SQL database’s transaction support ensures that when a new email is received, all related inserts (email metadata, recipient rows, attachment metadata) can be committed atomically, preserving consistency.

Email Body and Attachment Storage

As mentioned above, we will choose a document-oriented NoSQL database for email bodies and a distributed object storage for large attachments, achieving flexibility and scale:

NoSQL Data Model Choice: A document store suits email content since an email (body + attachments list) is naturally a self-contained document. It provides flexibility to store variable fields (different emails can have different headers or structure) and can be indexed on certain JSON fields if needed. For attachments, an object storage is ideal as it’s optimized for large binary objects and easy to scale (the database only needs to store references to these files). This combination ensures we can scale horizontally without a fixed schema limiting the email content structure.

Efficient Retrieval by Subject, Sender, or Date

To retrieve emails quickly by common attributes like subject, sender, or date, we rely on the SQL metadata store and its indexing:

Why this is efficient: The heavy filtering and searching is done on a strongly consistent, indexed SQL store optimized for such queries, while the bulk data transfer (email body, files) is done directly from a storage optimized for throughput. This separation means we can handle queries like “find all emails from Bob in 2025 with subject containing ‘Project’” quickly via metadata, then pull the needed bodies. Users experience quick search results, as the system avoids scanning large email texts during search – it only loads the text for the results the user needs to open.

Indexing for Fast Search

Proper indexing is critical for performance. In the SQL schema, we create indexes on the columns involved in frequent queries:

By thoughtfully indexing the SQL metadata, we optimize search performance. Indexes allow the database to find the matching records via tree lookups or hash lookups rather than scanning all rows. The trade-off is some extra storage and slightly slower writes (as the index must be updated on inserts), which is acceptable given the read-heavy nature of email retrieval. Fast search is further ensured by keeping the amount of data scanned small – we never scan through entire email bodies during a search-by-subject or sender, we only scan the indexed fields.

Sharding Strategy for NoSQL Storage

To handle billions of emails, the NoSQL data store must be distributed across many servers. We use sharding to split data into partitions across nodes:

Summary: By sharding the NoSQL store, we ensure that no single server handles all email data. This design can handle billions of emails since each shard manages a slice of the data. It improves performance (queries hit only relevant shards) and allows us to add more shards to increase capacity. The key is choosing a shard key that balances load and aligns with access patterns. Sharding by user strikes a good balance for email – it’s how many large email providers distribute data – because it isolates each user’s workload and naturally spreads users across the cluster.

Multi-Region Data Distribution

For global availability and redundancy, the data is replicated across multiple regions (data centers). The goal is to keep the service running and responsive even if one region goes down or if users are geographically distributed:

Overall, multi-region distribution provides redundancy and low latency. Users in, say, North America, Europe, and Asia can all experience fast access with local data copies, and the service can tolerate a data center outage without data loss. This design aligns with the principle of no single point of failure: data is distributed across servers and regions, making the system more resilient.

Trade-offs Between SQL and NoSQL Usage

Using a hybrid SQL+NoSQL design introduces trade-offs. We deliberately use each technology where it’s strongest, but we must balance consistency, complexity, and performance:

In summary, the hybrid design tries to capture the best of both worlds: SQL provides structured, consistent metadata management, and NoSQL provides scalable, high-throughput storage for the bulk of data. The main trade-off is complexity and the need to ensure consistency between the two layers, but with careful schema design (using a common key, and reliable messaging or transactions when linking the two), these challenges are manageable. This design ensures that the email service can scale to billions of messages and global usage without sacrificing the reliable indexing and relationships that users (and the application logic) rely on.

Detailed Component Design
Detailed Component Design

Caching Layer

To ensure low latency for common operations, we add caching:

Threading (Conversation) Implementation

To support email threading (conversation view):

Security and Privacy Measures

Indexing Strategies for Fast Search

Putting It Together – Data Management Summary:

All components must coordinate to maintain data consistency. For example, when an email is delivered, multiple writes happen: one to blob store, one to metadata DB, one to search index, one to notification system, etc. We have to design transactions or idempotent operations carefully. In such a large system, a fully ACID transaction spanning all these is not feasible, so we use eventual consistency and carefully handle failures:

We also consider backup strategies: regular backups of metadata, possibly storing multiple copies of data across regions in real-time (so disaster recovery is seamless, discussed more in reliability section).

Now that we have a clear picture of data design and component details, we should address how to scale this system and keep it performant under heavy load.

Step 7: Scalability and Performance Strategies

To meet the scale requirements, the system must employ various strategies for load balancing, partitioning, caching, and asynchronous processing. We outline these strategies and how they ensure the system runs efficiently as it grows.

Partitioning/Shard Strategy (Example): Suppose we have 1 billion users and we decide each storage shard should handle ~1 million users for manageability. That would require 1000 shards. Each shard could be a set of machines (for replication). We might group shards into bigger groups (like 10 sets of 100 shards, each group in a region). The directory service maps user to shard. If one shard becomes too loaded (maybe users on that shard collectively store more data than expected), we can split it: user IDs could be re-hashed into a new scheme or we migrate some users to another shard in the background. Tools to move users’ data between shards would be essential for operations.

Trade-offs in Scalability:

By leveraging these scalability strategies, the system can handle growing load while maintaining performance. Next, we address how to ensure high reliability and availability in such a distributed system.

Step 8: Reliability and Availability Considerations

Designing for 99.999% uptime and high reliability requires eliminating single points of failure and preparing for disasters. Here are the key reliability strategies:

By implementing these reliability measures, we aim to meet the 99.999% uptime goal. Users should very rarely experience an outage or email loss. Now, we conclude with considerations for future improvements and evolution of the system.

Step 9: Evolution, Extensions, and Future Optimizations

Designing a system like this is not a one-time task; it will evolve. Here are some future considerations and improvements that can be made as the service grows and user needs change:

Finally, any large-scale system must be prepared to evolve. The architecture we designed is modular, which should accommodate replacing or upgrading components as needed. Continuous deployment practices will help roll out improvements with minimal disruption.

🤖 Don't fully get this? Learn it with Claude

Stuck on Designing Gmail? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.

🪜 Hint ladder (no spoilers)

Progressively stronger hints — you still solve it.

I'm working on the problem **Designing Gmail** (System Design). Give me a HINT LADDER: start with the tiniest nudge, then wait. Only reveal the next, stronger hint when I ask. Do NOT show the full solution unless I type 'show solution'. Keep me doing the thinking. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🎨 Explain the approach visually

See the technique, not just code.

Explain the optimal approach to **Designing Gmail** with a VISUAL walkthrough: trace it on a small concrete example using ASCII art / a step-by-step diagram, narrate what changes each step, then give time & space complexity with a one-line derivation. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🔍 Review my solution

Catch bugs, edge cases, sub-optimality.

I'll paste my solution to **Designing Gmail**. Review it for correctness, missed edge cases, and time/space complexity, then coach me toward the optimal — don't just rewrite it. Ask me to paste my code now. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🔁 Drill the pattern

Lock in recognition with look-alikes.

Give me 2 problems that use the SAME underlying pattern as **Designing Gmail**. For each, let me attempt first, then review my answer and name the trigger signal that reveals the pattern. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

📝 My notes