Knowledge Guide
HomeSystem DesignSystem Design Problems

medium Designing Reddit

Image
Image

We are building an online social media platform similar to Reddit, where users can share content and discuss in community forums. In this system, users create posts (which can be text, images, links, or videos) and organize them into topic-based communities called subreddits (or communities). Other users can then engage by voting on posts (upvote or downvote) and adding comments to discuss the content. A real-world example is Reddit itself – often dubbed “the front page of the internet” – which hosts thousands of active communities for different interests. Each post has a score based on user votes, and posts are ranked within their community and on users’ personalized feeds.

Key Entities/Terms:

Reddit Key Entities
Reddit Key Entities

Before designing, it’s crucial to pin down what features and constraints our system must satisfy. We identify both functional requirements (features) and non-functional requirements (performance and scale goals).

Functional Requirements (Features):

Non-Functional Requirements (Scale and Performance):

These requirements define our target: a feature-rich platform that feels real-time and reliable to users, and can scale to large communities with heavy read traffic. Next, we estimate scale to guide our design choices.

Step 2. Back-of-the-Envelope Capacity Estimation

To design for scale, let's estimate the expected workload and data volumes:

Summary

Details

These estimates guide the design decisions (for example, the heavy read ratio suggests aggressive caching and replication, while the high write counts for votes suggest specialized handling). The system should also be designed to scale beyond these numbers, as a Reddit-like platform could continue to grow.

3. API Specifications

Here are the core internal APIs for a Reddit-like web-scale service. These internal APIs are not exposed publicly and are typically protected via network-level controls or service identity (rather than full user authentication).

1. Fetch Home Feed (GET /feed)

Description: Retrieves a personalized home feed for the authenticated user. The feed service aggregates posts from subreddits the user follows and recommended posts, sorted according to a specified ranking algorithm (e.g., hot, new, top). Supports pagination for infinite scroll. This is typically called when a user opens their home page feed.

Endpoint: GET /feed
Method: GET

Request Parameters:

Sample Request:

GET /feed?sort=hot&limit=20&after=t3_xy123 HTTP/1.1 Host: api.internal.redditclone.com Authorization: Bearer <USER_JWT_TOKEN>

In this example, the client requests 20 posts from the home feed, sorted by the "hot" algorithm, starting after post ID t3_xy123 (as a pagination cursor).

Response Format:

Expected Behavior & Use Case: The Feed service uses this endpoint to serve the home timeline for users. On success, it returns a personalized list of posts that the user is allowed to see (e.g., from subreddits they subscribe to or general recommendations). The results are ordered by the specified sorting algorithm (defaulting to Reddit’s “hot” ranking if none is specified). Pagination ensures the client can fetch additional posts by providing the after cursor from the previous response. This design allows the home feed to be dynamically generated and continuously fetched as the user scrolls, while maintaining performance at web scale (via cursor-based pagination and internal caching of personalized feeds).

2. Create a Post (POST /posts)

Description: Allows an authenticated user to create a new post in a specific subreddit. The post can be a text post, a link, or a media post (image or video). The service will validate the input (e.g., presence of required fields, title length, and content type consistency) before creating the post. This endpoint is used when a user submits a new post through the app or website.

3. Fetch Comments on a Post (GET /posts/{post_id}/comments)

Description: Retrieves all comments for a given post, returned in a threaded (nested) format. Supports pagination for posts with large numbers of comments and allows sorting by different criteria (e.g., new, top, old). Used when a user views a post and its comment thread.

4. Vote on a Post or Comment (POST /votes)

Description: Records an upvote or downvote from a user on a given post or comment. This is an idempotent action that sets the user’s vote to a specific value (upvote, downvote, or neutral) on the target item. A user can only vote once per item – repeated votes will update the previous vote. Used when a user clicks the upvote/downvote buttons.

5. Send a Direct Message (POST /messages)

Description: Sends a private direct message from the authenticated user to another user. The message is stored persistently (in the messaging service’s database) and triggers a notification for the recipient. This is used when a user composes a new message or replies to an existing conversation in their inbox.

6. Fetch Notifications (GET /notifications)

Description: Retrieves a list of notifications for the authenticated user, with a focus on unread notifications. Notifications can include things like someone mentioning the user, replying to their post or comment, receiving a direct message, or other alerts. The endpoint supports pagination for when there are many notifications, and can optionally mark notifications as read once retrieved.

7. Search Posts and Comments (GET /search)

Description: Provides a search across posts and comments in the Reddit-like system based on a query. Supports filtering by subreddit, author, and time, as well as specifying whether to search posts, comments, or both. This is used when a user enters keywords in the search bar to find relevant content.

Step 5: High-Level System Design

At a high level, we’ll design the platform as a set of distributed services and components that handle different responsibilities. The system will be structured to handle two primary flows: content submission (writes) and content consumption (reads). We will use a modular, microservices-oriented architecture to allow each piece to scale independently and to isolate complexity.

Key Points

Reddit High-level Design
Reddit High-level Design

Details

Below are details of the major components and how they interact:

Detailed Design of Reddit
Detailed Design of Reddit
Reddit Comments Tree
Reddit Comments Tree

Inter-service Communication: The microservices will primarily expose RESTful APIs (or gRPC endpoints) that the API Gateway calls. For example, when a user loads a subreddit page, the request might go to an endpoint in the API Gateway, which then calls the Post Service (to get posts) and the Comment Service (to get comment counts) and aggregates the response. However, many operations are best handled asynchronously via an event stream. We will use a distributed messaging system like Apache Kafka (or RabbitMQ) to enable event-driven processing. For instance, when a user upvotes a post, the Vote Service updates the score in its database and publishes an event such as "PostUpvoted" to a Kafka topic. The Notification Service consumes that event to send a notification to the post author, and the Post Service consumes it to recompute the post’s rank. Using a queue decouples these actions and improves throughput (services don’t block waiting on each other). It also provides a buffer during spikes.

This high-level design emphasizes separation of concerns: each service has a focused responsibility, and we’ve outlined how data flows through the system. Next, we’ll delve deeper into specific components and data management details, and address how to handle the core challenges (like feed fan-out, data storage, and maintaining performance at scale).

Note: In an initial version, one might start with a monolithic architecture (all web/app logic on one server cluster and one big database). However, at “web-scale” (hundreds of millions of users), a monolith would quickly break down. The system must be distributed. Reddit’s own architecture evolved from a monolith to multiple services for scalability. Our design adopts a modular approach from the start, enabling us to meet the scale requirements.

In this section, we detail how each major component will work internally and how data will be organized and managed. This includes data schemas, partitioning strategies, and the implementation of different features under the hood.

Key Points

Details

Data Model for Posts, Comments, Subreddits, and Users

Field NameData TypeDescription
post_idBIGINTPrimary Key. Unique post ID (could be globally unique across shards, e.g. via a Snowflake ID generator).
subreddit_idBIGINTFK to Subreddits.subreddit_id. Subreddit in which the post is made. Indexed.
author_idBIGINTFK to Users.user_id. User who created the post. Indexed.
titleVARCHAR(300)Post title text.
contentTEXTPost content (text or URL).
content_typeVARCHAR(20)Type of post ('text', 'link', 'image', etc.).
created_atDATETIMEPost creation timestamp.
updated_atDATETIMELast edit time (if edited).
scoreINTDenormalized score (net upvotes minus downvotes).
comment_countINTDenormalized count of comments on this post.
is_deletedBOOLEANFlag if the post is deleted/removed.

Given the volume, posts will be stored in a sharded NoSQL database. Partitioning can be done by post_id hash or by subreddit_id. For example, we could incorporate the subreddit identifier into the partition key, allowing posts in the same subreddit to be spread across shards while also enabling them to be queried together. Using a wide-column store like Cassandra, we might have a primary key as (subreddit_id, post_id), which allows efficient retrieval of all posts in a subreddit (since they will be clustered by subreddit). Alternatively, a store like DynamoDB could store each post item with a primary key of post_id and a secondary index on subreddit_id for querying by subreddit. Data is denormalized as needed to avoid cross-shard joins (each post entry might also store the subreddit name or author username for convenience, or these can be fetched from the User/Subreddit service).

Field NameData TypeDescription
comment_idBIGINTPrimary Key. Unique comment ID.
post_idBIGINTFK to Posts.post_id. The post on which this comment was made. Indexed.
parent_comment_idBIGINTFK to Comments.comment_id of the parent comment (NULL if top-level). Indexed.
author_idBIGINTFK to Users.user_id. User who wrote the comment.
contentTEXTComment text content.
created_atDATETIMEComment creation timestamp.
updated_atDATETIMELast edit timestamp (if edited).
scoreINTDenormalized score (upvotes minus downvotes) for the comment.
is_deletedBOOLEANFlag if comment is deleted/removed (content may be null or replaced with “[deleted]”).

The Comment Service can retrieve all comments for a given post_id efficiently by querying the comments table on the post_id partition. For nesting, we can either use recursive queries or fetch all comments for a post and then construct the tree in memory. Comments can also be sharded by post_id or by comment_id. A likely choice is to partition by post_id so that all comments for a post reside in the same partition or set of partitions (ensuring one post’s comment thread can be fetched with minimal scatter). Since one post could have thousands of comments, those partitions still need to handle high read volume. We can cache the top-level comments separately to reduce load.

Voting and Score Management

Voting is a critical feature that directly influences content ranking. However, handling billions of votes efficiently is challenging. Key design points for the Vote Service and data handling include:

Feed Generation and Ranking

One of the hardest parts of a social platform is efficiently delivering feeds – the lists of posts users see – especially as content and user counts grow. We have two main approaches to deliver feeds:

We will use a hybrid approach for Reddit-like design:

Feed Ranking: After determining which posts go into a feed, we need to order them. We will implement Reddit-like ranking algorithms:

Caching Feeds: We heavily utilize caching for feeds. For example, the front page (r/popular or aggregated) is seen by many users; we can cache that query result for a short time (say 1 minute) so that not every request triggers DB work. User-specific feeds can also be cached in memory (perhaps stored in Redis with a key like feed:user123) for quick retrieval. This cache would be invalidated or updated whenever new posts arrive for that user. The job queue approach can be used: on a new post or vote, enqueue tasks to update relevant cached lists.

Feed Example: If User A follows subreddit X and Y, and user Z:

User Messaging Design

For user messaging (DMs), the system functions like a simple mail inbox. Real-time delivery (as in chat) is not required, which simplifies things: we do not need persistent WebSocket connections or instantaneous typing indicators. Instead, we focus on reliable storage and retrieval of messages and notifications when new messages arrive.

Messages Table: Stores private direct messages between users. This table is smaller and can be replicated fully or sharded by receiver_id.

Field NameData TypeDescription
message_idBIGINTPrimary Key. Unique message ID.
sender_idBIGINTFK to Users.user_id. User who sent the message.
receiver_idBIGINTFK to Users.user_id. User who is the recipient.
contentTEXTMessage content body.
sent_atDATETIMETimestamp when the message was sent.
is_readBOOLEANFlag if the receiver has read the message.

Workflow for messaging

In short, the Messaging component is like a mini email system: it stores messages reliably, allows retrieval by the user, and ties into notifications for new message alerts. It’s simpler than real-time chat and can be built on top of existing database tech with partitioning.

Notification System

The Notification system ensures users are kept aware of relevant events (without having to constantly check everything). Design considerations for notifications:

Overall, notifications enhance user engagement by alerting users to relevant interactions. The system uses an event-driven approach to capture events and deliver notifications asynchronously. High availability is important (we don’t want to lose notifications), so the process should be reliable (using durable queues and redundant storage).

Search Indexing and Retrieval

Search is a critical feature given the huge volume of content. Users should be able to search for posts (and possibly comments) by keywords, with options to filter by community, date, or author. Implementing this at scale requires a specialized search engine.

In summary, the Search subsystem is built around an engine similar to ElasticSearch with asynchronous indexing.

Putting it Together: Workflow Examples

To illustrate end-to-end how data flows through the designed components, consider a couple of scenarios:

Each of these flows involves multiple components but is designed in such a way that heavy lifting (like updating many indices or sending many notifications) is done asynchronously and does not block the user’s action.

To meet the demands of a web-scale system, we employ multiple strategies to scale horizontally and maintain high performance:

1. Horizontal Scaling of Stateless Services: All our core services (Auth, Post, Feed, etc.) are designed to be stateless and thus can run behind load balancers with multiple instances. If user traffic increases, we simply deploy more instances of each service. Because state (data) is in the databases and caches, any service instance can handle any request for a given type. This gives us near-linear scaling on the application layer – for example, if one API server can handle 1000 requests/sec, 10 servers can handle 10k/sec. We also plan for auto-scaling: monitoring CPU, memory, and QPS, automatically adding or removing server instances according to demand (which is useful given diurnal traffic patterns).

2. Database Sharding and Replication: The databases are often the bottleneck in large systems, so we must scale them:

3. Caching Everywhere: Caches are our best friend for performance:

4. Asynchronous Processing: We extensively use message queues and background workers to handle tasks that need to be done but not necessarily immediately:

5. Load Balancing and Traffic Management:

6. Handling Failures and Redundancy:

7. Performance Optimizations:

Trade-offs and Final Thoughts: Our design choices balance complexity with performance:

Finally, with this architecture, our platform should be able to handle the target load: the modular design allows scaling each part (web servers, databases, cache nodes) independently. We have addressed the main challenges of a Reddit-like system – ensuring fast reads through caching and precomputation, handling heavy writes via queues and sharding, and keeping the user experience real-time and personalized. The design is complex, but it covers reliability, scalability, and performance, which are all essential for a modern social media system at large scale.

🤖 Don't fully get this? Learn it with Claude

Stuck on Designing Reddit? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.

🪜 Hint ladder (no spoilers)

Progressively stronger hints — you still solve it.

I'm working on the problem **Designing Reddit** (System Design). Give me a HINT LADDER: start with the tiniest nudge, then wait. Only reveal the next, stronger hint when I ask. Do NOT show the full solution unless I type 'show solution'. Keep me doing the thinking. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🎨 Explain the approach visually

See the technique, not just code.

Explain the optimal approach to **Designing Reddit** with a VISUAL walkthrough: trace it on a small concrete example using ASCII art / a step-by-step diagram, narrate what changes each step, then give time & space complexity with a one-line derivation. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🔍 Review my solution

Catch bugs, edge cases, sub-optimality.

I'll paste my solution to **Designing Reddit**. Review it for correctness, missed edge cases, and time/space complexity, then coach me toward the optimal — don't just rewrite it. Ask me to paste my code now. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🔁 Drill the pattern

Lock in recognition with look-alikes.

Give me 2 problems that use the SAME underlying pattern as **Designing Reddit**. For each, let me attempt first, then review my answer and name the trigger signal that reveals the pattern. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

📝 My notes