Knowledge Guide
HomeSystem DesignSystem Design Problems

hard Designing Flash Sale System

Image
Image

Step 1: System Definition

A flash sale system is an e-commerce component designed to handle short-term sales events where a limited inventory of products is sold to a massive number of buyers in a very short time frame. In such events, traffic spikes dramatically (e.g. hundreds of thousands of users hitting the site simultaneously), and the number of purchase requests far exceeds the available items (for example, millions of requests for only thousands of items). The system must rapidly serve many concurrent users, reliably track inventory, and fairly allocate the product items. This requires a highly scalable and robust architecture that can maintain consistency (no overselling of products), low latency for user actions, and high availability despite extreme load. The flash sale system typically integrates with the broader e-commerce platform (user accounts, product catalog, payment processing) but uses specialized strategies (caching, queuing, etc.) to withstand bursty traffic and ensure a smooth user experience during the sale.

Core Entities:

Step 2: Clarify and Define Requirements

Functional Requirements:

Non-Functional Requirements:

Step 3: Back-of-the-Envelope Capacity Estimation

Let’s estimate the scale to ensure our design can handle it:

Step 4: High-Level Architecture

At a high level, the flash sale system will use a multi-tier, distributed architecture with components to handle presentation, application logic, caching, queuing, and persistence. Below is an outline of the key components and how they interact:

High-level Design
High-level Design

Caching Layer

To reduce database load, a cache is employed for frequently accessed data. The cache (e.g., Redis or Memcached) will store data like product details. For relatively static data (product info, images, descriptions, flash sale configuration), caching is straightforward and can drastically cut down read traffic to the DB. For example, product pages can be served from cache or even a CDN if they don’t change often during the sale. The cache is updated whenever the DB updates so it stays consistent.

Step 5: Database Schema

Below is a detailed relational schema for the key tables — Users, Products, Inventory, Reservation, Orders, and Payments — followed by indexing, partitioning, and constraint recommendations to optimize performance for high-volume events.

Users Table

This table stores user accounts and authentication details.

Field NameData TypeDescription
user_idBIGINT (PK)Unique user identifier (primary key, auto-increment).
emailVARCHAR(255) UNIQUEUser's email address (used for login; must be unique).
password_hashVARCHAR(255)Hashed password for authentication.
saltVARCHAR(255)Salt used for hashing the password (if applicable).
nameVARCHAR(100)User’s full name.
created_atDATETIMETimestamp when the account was created.
updated_atDATETIMETimestamp of the last update to the account information.
statusVARCHAR(50)Account status (e.g., 'active', 'disabled', 'banned').

Notes:

Products Table

Manages items available for the flash sale.

Field NameData TypeDescription
product_idBIGINT (PK)Unique product identifier (primary key, auto-increment).
nameVARCHAR(200)Product name.
descriptionTEXTDetailed description of the product.
priceDECIMAL(10,2)Regular price of the product.
categoryVARCHAR(100)Category or type of product (for organization/filtering).
statusVARCHAR(50)Product status (e.g., 'active', 'out_of_stock', 'discontinued').
created_atDATETIMETimestamp when the product was added to the catalog.
updated_atDATETIMETimestamp of the last update to product info or price.

Notes:

Inventory Table

Tracks stock levels for each product to prevent overselling during the flash sale.

Field NameData TypeDescription
product_idBIGINT (PK, FK to Products)Product identifier (primary key, references Products).
quantityINTAvailable stock for the product. This value is decremented when orders or reservations occur.
last_updatedDATETIMETimestamp of the last inventory update.

Notes:

Reservation Table

Temporarily holds stock for a user who is in the process of checking out, to prevent others from buying that stock until the user completes payment or the hold expires.

Field NameData TypeDescription
reservation_idUUID (PK)Unique reservation identifier (primary key).
user_idBIGINT (FK to Users)User who reserved the product (foreign key to Users).
product_idBIGINT (FK to Products)Product reserved (foreign key to Products).
quantityINTQuantity reserved for this user.
reserved_atDATETIMETimestamp when the reservation was created.
expires_atDATETIMETimestamp when the reservation will expire if not completed.
statusVARCHAR(50)Reservation status ('PROCESSING', 'RESERVED', 'FAILED', 'SUCCEEDED', etc).

Notes:

Orders Table

Records purchases made by users during the flash sale. Each order is a purchase of a product by a user.

Field NameData TypeDescription
order_idBIGINT (PK)Unique order identifier (primary key, often auto-increment).
user_idBIGINT (FK to Users)User who placed the order (foreign key to Users).
reservation_idUUID (FK to Reservation)The rservertion against this order.
product_idBIGINT (FK to Products)Product that was purchased (foreign key to Products).
quantityINTQuantity of the product purchased in this order.
priceDECIMAL(10,2)Purchase price per unit at the time of order (captures flash sale price or discount if any).
statusVARCHAR(50)Order status (e.g., 'pending', 'completed', 'cancelled', 'refunded').
created_atDATETIMETimestamp when the order was created.
updated_atDATETIMETimestamp of the last update to the order status or details.

Notes:

Payments Table

Logs payment transactions (successful or failed) for orders.

Field NameData TypeDescription
payment_idBIGINT (PK)Unique payment transaction identifier (primary key).
order_idBIGINT (FK to Orders)Associated order that this payment is for (foreign key to Orders).
user_idBIGINT (FK to Users)User who made the payment (foreign key to Users; redundant to order’s user for quick access).
amountDECIMAL(10,2)Payment amount (should match order total for successful payments).
methodVARCHAR(50)Payment method (e.g., 'credit_card', 'paypal', 'wallet').
statusVARCHAR(50)Payment status ('success', 'failed', 'pending').
transaction_refVARCHAR(100)Reference ID from the payment gateway (e.g., transaction ID).
timestampDATETIMETimestamp of the payment attempt.
failure_reasonVARCHAR(255)Reason for failure if the payment did not succeed (nullable).

Notes:

Indexing and Partitioning Strategies

To support high concurrency and fast performance, proper indexing and data partitioning are essential:

Step 6: Detailed Component Design

In a flash sale, each buy request transitions through several states as it is processed by different services. Below is the lifecycle of a transaction from initiation to completion or termination:

1. Transaction States

2. Workflow Steps

  1. WAITING (Queued): When a user clicks “Buy” during the flash sale, the request is accepted by the Booking Service that creates a new reservation record (a Kafka message) and publishes to a Kafka topic specific to that product type. The message contains the reservation_id (a UUID), user_id, product_id, and a timestamp. At this stage, no database entry exists yet; the request is simply waiting in the queue to be handled in a first-come-first-serve manner. Queuing the incoming orders helps throttle the surge of requests and ensures orderly, sequential inventory updates. The reservation remains in this WAITING state until a worker service (Order Service) dequeues it for processing.

  2. PROCESSING (Underway): An Order Service instance pulls the request from Kafka and begins processing it. At this point, a reservation record is created in the database with the status PROCESSING to track the in-progress transaction. The Order Service now coordinates with the Inventory Service to reserve an item for the customer and then coordinates with the Payment Service to process the payment.

  3. RESERVED (Inventory Held/Pending Payment): In this state, the item has been successfully reserved for the order. The Inventory Service decrements the available stock so that no other customer can purchase this item while payment is pending. It also updates the reservation status to RESERVED in the database, and notifies the Order Service, which then alerts the user — typically by redirecting them to a payment page — that their item is locked in. The reservation is temporary – the user is given a fixed time window (e.g., 5 minutes, as configured for the flash sale) to complete the payment. During this time, the order is essentially in a pending payment state. The system starts an expiration timer for the reservation. No other buyer can claim this inventory during the reserved period, which prevents overselling (selling more items than stock) and also avoids underselling (holding stock indefinitely without purchase). If the item is already sold out by the time of processing, the Order Service will mark this order as FAILED (out-of-stock) and trigger a failure response. Here is how we can atomically decrement the inventory and update the reservation status using a database transaction:

-- @productId : The ID of the product being purchased -- @orderId : The ID of the order currently in 'PROCESSING' state START TRANSACTION; -- 1. Attempt to decrement inventory if stock is available. UPDATE Inventory SET stock = stock - 1 WHERE product_id = @productId AND stock > 0; -- Check if the inventory update succeeded. SET @rows_updated = ROW_COUNT(); -- 2. Update the order status based on inventory availability. IF (@rows_updated = 1) THEN -- Inventory was available; mark the transaction as RESERVED. UPDATE Reservation SET status = 'RESERVED' WHERE transation_id = @transationId AND status = 'PROCESSING'; ELSE -- Inventory not available; mark the order as FAILED. UPDATE Reservation SET status = 'FAILED' WHERE transation_id = @transationId AND status = 'PROCESSING'; END IF; COMMIT;

The above query is possible only when Inventory and Reservation tables are on the same shard, because we can't have a DB transaction updating cross-shard tables. In the 'Database Schema' section, we suggested partitioning the DB based on user_id to distribute reservations onto multiple shards (as compared to partitioning based on product_id, which can overload a partition containing a hot product). To handle partitioning based on user_id, we will update the inventory separately and then update the reservation record. Since this will not be happening in one DB transaction, we could have a failing scenario where the Inventory Service decrements the inventory record but crashes before updating the reservation status to 'RESERVED'. Now, when another instance of the Inventory Service takes up this request, it will decrement the inventory again, as the reservation is still in 'PROCESSING' state.

To handle this scenario, where the Inventory and Reservation tables are on different shards, we will use a helper table called InventoryUpdated. Here’s how it works:

  1. Updating Inventory: When the Inventory Service decrements the stock, it also inserts a record into the InventoryUpdated table which is also present on the same shard. Both of these queries will happen in one DB transaction. This record in InventoryUpdated acts like a flag, marking that the inventory has already been updated for this request.

  2. Crash Handling: No, if the service crashes before it can update the Reservation record, the presence of the record in InventoryUpdated tells any new service instance that the inventory was already been decremented for this request. The service will check the presence of this record against the reservation and decrements the stock only if no such record exists. This prevents the system from subtracting the stock a second time when retrying the workflow.

  3. Finalizing the Reservation: Once the Reservation is marked RESERVED, the Inventory Service will delete the corresponding record from InventoryUpdated. This cleanup ensures this helper table stays small and only contains pending updates.

In summary, the InventoryUpdated table is used to safely coordinate the inventory deduction when working across different shards. It prevents accidental double-decrementing of stock if a failure occurs during the process, ensuring that each reservation is processed exactly once.

  1. SUCCEEDED (Completed): This is the successful completion state for the transaction. It is reached when the Payment Service confirms that payment was completed within the allowed time window. Once the user pays (e.g., entering payment details and the payment is approved), the Payment Service notifies the Order Service (via a Kafka message) that the payment is complete. The Order Service then updates the reservation status to SUCCEEDED in the database, finalizing the transaction. At this point, the item is officially sold and will not be returned to inventory. Downstream actions can be triggered here (e.g. generating an order confirmation for the user, notifying fulfillment/shipping services). The transition from RESERVED to SUCCEEDED marks a successful flash sale purchase.

  2. FAILED (Error or Payment Failed): The reservation becomes FAILED if the payment ultimately cannot be completed. The user attempted payment but it was declined or did not succeed even after retries. For example, the Payment Service tried charging the card multiple times or via multiple methods and exhausted all retry attempts without success. In this case, the reservation is marked as FAILED due to payment failure. The reserved inventory is then released back to stock (since the item was never actually paid for).

  3. EXPIRED (Timed Out): If the user does not complete payment within the allotted reservation time, the reservation moves to EXPIRED state. This is an automatic transition triggered by a timeout. A dedicated Reserved Transaction Expiry Handler watches for orders stuck in RESERVED beyond the payment window. When a timeout occurs, it updates the reservation status to EXPIRED and publishes an event (on a Kafka topic) indicating the reservation expired. The Inventory Service consumes this event and increments the stock back, effectively restoring the item to inventory. This design ensures that inventory is not permanently lost due to an abandoned cart – the quantity is returned for others to buy once the original reservation expires. From the user’s perspective, an expired order typically means they took too long to pay and the order was canceled by the system. They might receive a notification that the item was released, and if they still want it, they’d have to place a new order. The system must also handle if a payment notification comes in after expiration – for example, by rejecting the payment and initiating a refund if the order is already expired, as a late payment event could occur in rare cases due to network delays.

  4. CANCELED (User Aborted): This state occurs when the user actively cancels the reservation before completing payment. For instance, if the user changes their mind and clicks a “Cancel Order” button during the payment phase (while the reservation is RESERVED), the system will mark the reservation as CANCELED. Like expiration, cancellation triggers the release of reserved inventory back to the stock. The Order Service updates the status to CANCELED in the DB, and an event is sent to the Inventory Service to increment the item count back. From a workflow perspective, a canceled transaction is very similar to an expired one, except it was manually triggered by the user rather than by a timer. The user is typically shown a cancellation confirmation, and the item becomes available to others again.

State Transitions Summary: A typical successful flow is WAITING → PROCESSING → RESERVED → SUCCEEDED. However, at the RESERVED stage the order can also go to EXPIRED (if timed out), FAILED (if payment fails or other error), or CANCELED (if user aborts). Throughout this lifecycle, services communicate via Kafka messages to update the reservation and keep data (like inventory counts) consistent.

Detailed Component Design
Detailed Component Design

3. Background Job: RESERVED Transaction Expiry Handler

Purpose

This background service is responsible for:

  1. Tracking RESERVED transactions in-memory to efficiently detect expired transactions.
  2. Handling expiration logic by marking overdue transactions as EXPIRED, restoring inventory, and notifying the inventory service.
  3. Ensuring recovery from crashes by reloading RESERVED transactions from the database.
  4. Providing scalability and fault tolerance, so that multiple instances can coordinate expiry handling.

1. Storage and Retrieval of RESERVED Transactions

2. Transaction Expiry Check Mechanism

3. Processing Expired Transactions

Notify the Order Service that stock is available by pushing an event to Kafka (e.g., "Inventory.Stock_Released" event). The Order Service will coordinate with the Inventory Service to mark the reservation as EXPIRED in the database and increment inventory back.

4. Concurrency & Scaling Considerations

This design ensures high efficiency, fast detection of expired transactions, and robust crash recovery.

4. Message Queue and Concurrency Control

Apache Kafka is the backbone of our flash sale request pipeline. All incoming purchase attempts are funneled through Kafka to smooth out the traffic spikes (a technique often called queue-based load leveling). By queuing requests, we ensure the downstream processing (order creation, payment, etc.) happens at a rate the system can handle, rather than being overwhelmed in the first second of the sale. Kafka is well-suited for this due to its high throughput and ability to retain ordered logs of events.

Summary: Kafka ensures sequential processing for each product and serves as a buffer to convert the instantaneous spike into a steady stream. This dramatically reduces contention on shared resources like the database. The use of topics per product enforces fairness (each user’s request is enqueued in the exact order received). And because of the asynchronous design, user-facing operations remain snappy – the user clicks “Buy” and immediately gets a queued confirmation, rather than waiting for the entire order to complete. This improves perceived performance and avoids users retrying in frustration (which could cause more load).

5. Booking Service: Retrieving and Returning Queue Position

Queue Position Tracking Mechanism: The Order Service, which consumes order requests from the Kafka queue, is responsible for determining each user’s updated position in line. Under the hood, each incoming request can be tagged with a queue sequence number (for example, using the Kafka message offset as an identifier for position). In Apache Kafka, every message in a topic partition has an offset, which is essentially its position in the log (queue). The Order Service (as a Kafka consumer) continuously tracks the latest offset it has processed/committed.

When a user’s client polls for an updated position, the server (e.g., a lightweight Queue Service or the Order Service itself) can calculate how many requests are still ahead of that user’s request in the queue. One way to do this is by comparing the user’s message offset (or initial position number) with the Oreder Service’s current processing offset. For instance, Kafka’s consumer lag — the difference between the latest produced message and the last consumed message — indicates how many events remain unprocessed in the queue. If a user’s request is at offset 1050 and the Order Service has last processed offset 1025, there are 25 messages ahead of that request in the queue (meaning the user’s current position is 26th).

Step-by-Step Mechanism:

  1. Assign Queue Position on Enqueue: When a purchase request arrives during the flash sale, it is published to the Kafka queue. At this point, the system can determine the request’s position in line – e.g., by noting the Kafka message offset or by using an incrementing counter. The user is informed of their initial queue position (for example, “You are #300 in line”).
  2. Retrieve Updated Position: When the client asks for an update (via polling), the server checks how many messages have been processed relative to the user’s position. This can be done by computing the number of messages ahead of the user in the queue. For example, the system can subtract the Order Service’s latest committed offset (or count of processed requests) from the user’s message offset to see how many are still in front.
  3. Return/Push Position to User: The server returns this number to the client in its response (e.g., { "queuePosition": 26 }). The client then updates the UI to show the new queue position. This cycle continues until the user reaches the front of the queue (position 1) and their request is picked up for processing by the Order Service.

Real-Time Queue Updates: Polling vs WebSockets

Current Approach (Polling): The current design uses client polling to update queue positions. The client repeatedly sends requests to the server at intervals to get the latest position. This is straightforward to implement and works on all browsers (using regular HTTP requests). However, frequent polling can be inefficient – each request/response carries overhead (HTTP headers, connection setup) and may return no new data if the position hasn’t changed much. There’s also a slight latency trade-off: if the client polls, say, every 5 seconds, the user’s position update could be up to 5 seconds out-of-date. Polling too often reduces latency but increases server load; polling less often saves resources but gives less real-time feedback.

Alternative Approach (WebSockets): As an enhancement, the system could use WebSockets to push queue position updates to clients in real time, instead of relying on polling. With a WebSocket, the client opens a persistent connection to the server. The server can then send (push) updated queue positions to the client immediately whenever the position changes, without the client having to ask repeatedly. This means the user’s position on the screen updates in real-time (for example, moving from #42 to #41 as soon as one request ahead is processed).

Trade-offs – Polling vs WebSockets:

🤖 Don't fully get this? Learn it with Claude

Stuck on Designing Flash Sale System? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.

🪜 Hint ladder (no spoilers)

Progressively stronger hints — you still solve it.

I'm working on the problem **Designing Flash Sale System** (System Design). Give me a HINT LADDER: start with the tiniest nudge, then wait. Only reveal the next, stronger hint when I ask. Do NOT show the full solution unless I type 'show solution'. Keep me doing the thinking. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🎨 Explain the approach visually

See the technique, not just code.

Explain the optimal approach to **Designing Flash Sale System** with a VISUAL walkthrough: trace it on a small concrete example using ASCII art / a step-by-step diagram, narrate what changes each step, then give time & space complexity with a one-line derivation. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🔍 Review my solution

Catch bugs, edge cases, sub-optimality.

I'll paste my solution to **Designing Flash Sale System**. Review it for correctness, missed edge cases, and time/space complexity, then coach me toward the optimal — don't just rewrite it. Ask me to paste my code now. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🔁 Drill the pattern

Lock in recognition with look-alikes.

Give me 2 problems that use the SAME underlying pattern as **Designing Flash Sale System**. For each, let me attempt first, then review my answer and name the trigger signal that reveals the pattern. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

📝 My notes