Knowledge Guide
HomeSystem DesignSystem Design Problems

hard Designing Payment System

Image
Image

Step 1: System Definition

Design a payment processing platform (similar to Stripe) that enables merchants (businesses) to accept online payments from customers securely and reliably. The system will handle the entire lifecycle of a payment transaction – from capturing payment details to authorizing the transaction, transferring funds, and handling post-payment events (like refunds or chargebacks). Ultimately, this “Stripe-like” system serves as a payment service provider that combines the functionality of a payment gateway and a payment processor in one integrated platform.

Core Entities and Roles:

System Scope: Our payment system will expose APIs for merchants to perform actions like creating a payment charge, refunding a charge, etc., similar to Stripe’s API. It will internally handle authorization with banks, storing transaction records, managing customer payment details, fraud checks, and sending notifications (like webhooks or emails).

Step 2: Requirements Clarification

Functional Requirements

Non-Functional Requirements

Step 3: Back-of-the-Envelope Capacity Estimation

Step 4: High-Level System Design

Architecture Overview

High-Level Architecture
High-Level Architecture

Data Flow for a Typical Transaction (Credit Card Charge): To illustrate, consider a customer buying a product on a merchant’s website using a credit card:

  1. Client-Side Tokenization: It’s common (as Stripe does) that the merchant’s front-end uses a JavaScript library or SDK provided by the payment system to collect card details and send them directly to the payment system, getting back a token. This way, the merchant’s backend never sees the raw card number. For example, the browser calls our API to tokenize the card (via the Card Vault), and gets a token or payment method ID. This token is then sent to the merchant’s server.

  2. Charge Request: The merchant’s server (or client) calls our Charge API (e.g., POST /v1/charges) via the API Gateway. They include the amount, currency, and either the token from step 1 or some payment instrument details, plus possibly an idempotency key.

  3. API Gateway: Validates the merchant’s API key/auth token, checks request size/rate limits, then forwards the request to the Payment Service’s endpoint for creating a charge.

  4. Payment Service – Request Validation: The Payment Service first parses the request. It verifies the merchant is allowed to make this charge (e.g., checks account status, whether currency is supported, etc.). It also ensures required fields are present (amount, etc.) and that the amount is positive and within allowed limits. If an idempotency key is provided, it will do a lookup in the Idempotency store to see if this key has been seen for this merchant.

    • If the key exists and a result is stored, it will short-circuit and return the stored result (thus avoiding duplicate processing).
    • If not, it will record this key as in-progress (to prevent a race if a retry comes in while processing).
  5. Fraud Check: The Payment Service calls the Fraud Service to assess risk. If the transaction is high-risk, it may be declined or flagged for manual review.

  6. Card Data Retrieval: The Payment Service needs the card details to send to the acquirer (unless the merchant provided raw card info directly). If a token was provided, the Payment Service calls the Card Vault Service to get the actual card number, expiry, and possibly CVV (though often CVV isn’t stored, it’s provided by customer at time of transaction and not retained). The Card Vault returns the decrypted card data securely. This call is internal and secured – we never expose card data outside.

  7. External Authorization (Acquirer/Issuer): Now the Payment Service prepares a request to the external payment gateway/processor (which could be an acquirer’s API or a payment network connection). This might be a JSON API call: it sends card number, expiry, amount, merchant ID (or acquirer merchant ID), etc., to the acquirer.

    • This is typically the slowest step, as it involves leaving our system to call the bank network.
    • The response comes back with either approved (and an auth code, transaction ID) or declined (with a reason code), or an error (maybe timeout or network issue). The Payment Service receives this.
  8. Processing Response: If approved, the Payment Service will mark the transaction as approved. It generates a transaction record in the database (if not already created) with status = “approved” (or “succeeded”). It also creates a related ledger entry to credit the merchant’s balance with the amount (minus fees). If declined, it records the transaction as failed (with reason). If there was an error (no definitive response), this is tricky: we might decide to mark the transaction as “pending” or unknown and trigger a retry or manual follow-up. Often, if a call times out, you do not immediately retry the charge (to avoid double charge) – instead you might query the status or rely on idempotency (the next retry with same key will either get processed or we have logic to prevent double charge if the first actually went through).

  9. Post-Processing: After updating our records, the Payment Service will generate a response to return to the API caller (merchant). Typically, the response includes the transaction status (succeeded or failed), a unique charge ID in our system, and maybe details like captured amount, fees, etc. If failed, include an error message or code. This response goes back through the API Gateway to the merchant. From the merchant’s perspective, the API call to charge is now complete with a result. The customer at checkout sees “payment approved” (or error if declined).

  10. Asynchronous Events: Meanwhile, our system triggers follow-up processes:

    • The Payment Service (after committing the transaction) publishes a “Payment Succeeded” event to the internal Event Bus or sends a message to a queue that a payment is done. This event contains the charge ID, merchant, amount, etc.
    • The Webhook Service, subscribed to such events, will pick it up and look for any webhooks that the merchant has registered for “payment_succeeded”. It will then send an HTTP POST to the merchant’s callback URL with the data. This might happen within seconds of the transaction. If the merchant’s server is down, it will retry a few times over, say, the next hour. This ensures the merchant’s system is notified.
    • The Notification Service might see the event and if configured, send an email receipt to the customer.
    • A separate Analytics or Reporting Service could log the transaction to a data warehouse for long-term analysis (via an event or by tailing the transaction DB).

This refund flow will be as follows:

Step 5: Database Schema

Merchants

Stores merchant (business) account details. This table is relatively small and can be kept on a primary shard or globally.

Field NameData TypeDescription
merchant_id (PK)BIGINTUnique merchant identifier (primary key).
nameVARCHAR(255)Merchant’s business name.
emailVARCHAR(255)Contact email (unique).
statusVARCHAR(50)Account status (e.g., active, suspended).
created_atTIMESTAMPTimestamp when the merchant account was created.
available_balanceBIGINTCurrent available balance for the merchant (in cents). Denormalized for quick access; updated via ledger entries.
pending_balanceBIGINTFunds pending settlement (if applicable). Denormalized.

Customers

Stores end-customer profiles for each merchant (buyers who saved payment info or were charged).

Field NameData TypeDescription
customer_id (PK)BIGINTUnique customer identifier.
merchant_id (FK)BIGINTMerchant who owns this customer. FK to Merchants(merchant_id).
nameVARCHAR(255)Customer name.
emailVARCHAR(255)Customer email (could be NULL if not provided).
phoneVARCHAR(50)Customer phone number.
created_atTIMESTAMPProfile creation timestamp.
updated_atTIMESTAMPLast update timestamp.
default_payment_methodBIGINT(Optional) FK to default payment method in PaymentMethods.

Payment Methods

Stores tokenized payment details (cards, bank accounts) for customers.

Field NameData TypeDescription
payment_method_id (PK)BIGINTUnique payment method identifier.
merchant_id (FK)BIGINTOwner merchant. FK to Merchants(merchant_id).
customer_id (FK)BIGINTCustomer who owns this payment method. FK to Customers(customer_id).
typeVARCHAR(50)Payment type (card, bank_account, etc).
tokenVARCHAR(255)Token/reference to payment info (e.g., vaulted card token).
card_brandVARCHAR(50)If type=card: Card network (Visa, Mastercard, etc).
card_last4VARCHAR(10)If type=card: Last 4 digits of card number.
card_exp_monthINTIf type=card: Expiration month.
card_exp_yearINTIf type=card: Expiration year.
bank_nameVARCHAR(100)If type=bank: Bank name (optional).
bank_last4VARCHAR(10)If type=bank: Last 4 of bank account.
created_atTIMESTAMPWhen the payment method was added.
updated_atTIMESTAMPLast update timestamp.
is_activeBOOLEANWhether the payment method is active (not deleted/invalid).

Transactions

Core table for all payment transactions (charges, payments). This table is high-volume and critical.

Field NameData TypeDescription
transaction_id (PK)BIGINTUnique transaction ID.
merchant_id (FK)BIGINTMerchant who received the payment. FK to Merchants(merchant_id).
customer_id (FK)BIGINTCustomer who made the payment. FK to Customers(customer_id).
payment_method_id (FK)BIGINTPayment method used. FK to PaymentMethods(payment_method_id).
amountBIGINTTransaction amount in cents (e.g., $10 = 1000 cents).
currencyVARCHAR(10)Currency code (e.g., USD, EUR).
statusVARCHAR(50)Transaction status (pending, succeeded, failed, etc).
typeVARCHAR(50)Transaction type (charge, auth, capture, etc.).
descriptionVARCHAR(255)Description or order info (optional).
reference_codeVARCHAR(100)External reference (e.g., order ID from merchant system).
processed_atTIMESTAMPWhen the transaction was processed (authorized/captured).
settled_atTIMESTAMPWhen funds settled (if applicable, e.g., for ACH or delayed capture).
created_atTIMESTAMPCreation timestamp (initial request time).
updated_atTIMESTAMPLast update timestamp.

Transactional Integrity: Inserting a new transaction and updating balances/ledgers are done within a single ACID transaction to ensure all-or-nothing updates (money movement is never partially recorded).

Ledgers

Records financial entries for merchants – every credit or debit affecting a merchant’s balance (payments, refunds, chargebacks, payouts, fees). This provides an audit trail and running balance.

Field NameData TypeDescription
ledger_id (PK)BIGINTUnique ledger entry ID.
merchant_id (FK)BIGINTMerchant to whom this ledger entry belongs. FK to Merchants(merchant_id).
transaction_idBIGINTRelated transaction (if applicable). FK to Transactions(transaction_id).
typeVARCHAR(50)Entry type: e.g., payment_credit, refund_debit, chargeback_debit, payout_debit, fee_debit, etc.
amountBIGINTAmount of this entry (in cents). Credits (incoming funds) are positive; debits (outgoing) are negative amounts or recorded separately by type.
currencyVARCHAR(10)Currency (should match merchant’s transaction currency).
balance_afterBIGINTMerchant’s balance after this entry was applied.
descriptionVARCHAR(255)Description or reference (e.g., “Charge ID X”, “Payout to bank”, etc.).
created_atTIMESTAMPWhen the ledger entry was recorded.
settled_atTIMESTAMPIf applicable, when the entry was settled (e.g., payout completion date).

Refunds

Tracks refunds issued for transactions.

Field NameData TypeDescription
refund_id (PK)BIGINTUnique refund identifier.
merchant_id (FK)BIGINTMerchant who issued the refund. FK to Merchants(merchant_id).
transaction_id (FK)BIGINTThe original transaction being refunded. FK to Transactions(transaction_id).
amountBIGINTRefunded amount (in cents).
statusVARCHAR(50)Refund status (pending, succeeded, failed).
reasonVARCHAR(255)Reason for refund (customer request, product return, etc.).
created_atTIMESTAMPWhen the refund was initiated.
processed_atTIMESTAMPWhen the refund was completed (money actually refunded).

Webhooks

Stores outgoing webhook events to notify merchants of relevant events (e.g., a transaction succeeded, a refund completed).

Field NameData TypeDescription
webhook_id (PK)BIGINTUnique webhook event ID.
merchant_id (FK)BIGINTMerchant that should receive the webhook. FK to Merchants(merchant_id).
event_typeVARCHAR(100)Type of event (e.g., transaction.succeeded, refund.created).
event_dataTEXTPayload data (JSON or serialized) relevant to the event.
statusVARCHAR(50)Delivery status (pending, sent, failed).
attemptsINTNumber of delivery attempts made.
next_retry_atTIMESTAMPNext scheduled retry time if last attempt failed.
created_atTIMESTAMPWhen the event was generated.
delivered_atTIMESTAMPWhen the event was successfully delivered (if at all).

Event Logs

Captures all major system events for audit and troubleshooting (this could include security events, system errors, or high-level actions).

Field NameData TypeDescription
event_id (PK)BIGINTUnique event log ID.
merchant_idBIGINTMerchant related to the event (nullable if global event).
event_typeVARCHAR(100)Type of event (e.g., transaction.created, refund.processed, login, api_call).
event_detailsTEXTDetails about the event (could be JSON or message text).
created_atTIMESTAMPTimestamp of the event.
user_idBIGINT(Optional) User or staff who triggered the event (if applicable).
sourceVARCHAR(50)Source of event (system, merchant_portal, API, etc.).

Step 6: Detailed Component Design

Now we’ll explore key components in detail.

6.1 API EndPoints

These all funnel through the gateway to appropriate internal handlers.

6.2 Payment Service (Core Orchestration)

This service is the brain of the operation. We can consider it as a payment orchestration layer that coordinates between the database, external services, and internal auxiliary services.

Internal Structure: We could design the Payment Service itself in a modular way or even as multiple microservices. For instance, some organizations would split the responsibilities: an Orchestrator service, a Connector service for external calls, etc. For clarity, we’ll discuss it as one service with distinct sub-components/tasks:

Data Model & Storage Choice: For core transactional data, a relational database is a solid choice due to its ACID guarantees. Financial transactions require atomicity and durability. A SQL database (like Postgres or MySQL) can enforce constraints (like unique idempotency keys, referential integrity between a charge and a refund record, etc.) and can do multi-row transactions easily.

Synchronous vs Asynchronous Processing: The Payment Service does most steps synchronously to provide a result in the API response. We choose sync for the primary flow because merchants (and customers) expect an immediate answer to a payment attempt. However, behind the scenes we use asynchronous processing for things that need not block the customer’s request: emailing receipts, notifying external systems, etc., are done async via events after the main flow commits.

Card Vault Service: Let’s detail this component since it’s critical for security:

External Payment Integration Service:

6.3 Retryable vs. Non-Retryable Errors

Retryable Errors are issues that might be resolved if tried again. These fall into two categories:

Non-Retryable Errors are failures where retries won’t help, so the transaction should be considered a permanent failure:

Retry Strategies

Here’s a concise overview of each retry strategy mentioned:

  1. Fixed Interval

    • Retries happen at a constant time interval (e.g., every 2 minutes).
    • Simple to implement, but doesn’t adapt to changing load or error conditions.
  2. Exponential Backoff

    • Each retry increases the wait time exponentially (e.g., 1 min, then 2, 4, 8...).
    • Gives downstream systems time to recover from overload or transient failures.
  3. Linear Backoff

    • The delay increases by a fixed increment each time (e.g., retry after 5 min, then 10, then 15...).
    • Less aggressive than exponential but still avoids constant hammering at short intervals.
  4. Jitter-Based (Randomized) Backoff

    • Adds randomness to the delay (e.g., wait = base_delay * (1 + random factor)).
    • Helps prevent synchronized retry “storms” when multiple clients fail at once.
  5. Delayed/Deferred Queues

    • Schedules retries for a specific future time rather than immediately (e.g., next day for “insufficient funds”).
    • Useful for “soft decline” scenarios where waiting a longer period is more likely to succeed.

Often, real systems combine these strategies. For example, an initial quick retry (after a few seconds) in case of transient network blips, then use exponential backoff with jitter for subsequent attempts, and for certain decline reasons like “insufficient funds,” schedule a much later retry attempt (e.g. 24 hours later).

Failure Handling After Exhausting Retries

If all retry attempts are exhausted and the transaction still fails, the system should gracefully handle the permanent failure:

  1. Mark Transaction as Failed: Sets the transaction status to a final “FAILED,” indicating no more automatic attempts will occur.

  2. Notify Merchant/Customer: Sends a webhook or email notification explaining the final failure status.

  3. Log for Audit & Analytics: Records all retry attempts, timestamps, and error codes for future reference.

  4. Dead Letter Queue (DLQ): Optionally places the permanently failed transaction into a specialized queue if manual investigation or corrective actions are needed (e.g., an unknown error code from a payment processor that requires deeper follow-up).

6.4 Asynchronous Workflow and Communication

As noted, our design uses asynchronous messaging for certain tasks to improve throughput and decouple services. Let’s clarify how and where we use asynchronous processes:

Consistency Consideration: Whenever we introduce asynchrony, we have to think about consistency. For example, if Payment Service commits a transaction and publishes an event, what if the event publish fails after the DB commit? We could end up with a transaction not notified. Solutions include: use the DB as a source of truth and have a separate process that scans for new transactions and emits events (effectively making event publishing idempotent and retryable). Or use an outbox pattern: write the event to a table in the same transaction, and have an event relay service read from that table to publish to Kafka (ensuring no lost events). This is known as the transactional outbox pattern to avoid missing events in case of failures. Given complexity, we’ll assume either a robust event publishing (maybe using Kafka’s transactional feature or using the outbox idea) so that events are not lost.

Step 7: Scalability and Performance Strategies

🤖 Don't fully get this? Learn it with Claude

Stuck on Designing Payment System? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.

🪜 Hint ladder (no spoilers)

Progressively stronger hints — you still solve it.

I'm working on the problem **Designing Payment System** (System Design). Give me a HINT LADDER: start with the tiniest nudge, then wait. Only reveal the next, stronger hint when I ask. Do NOT show the full solution unless I type 'show solution'. Keep me doing the thinking. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🎨 Explain the approach visually

See the technique, not just code.

Explain the optimal approach to **Designing Payment System** with a VISUAL walkthrough: trace it on a small concrete example using ASCII art / a step-by-step diagram, narrate what changes each step, then give time & space complexity with a one-line derivation. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🔍 Review my solution

Catch bugs, edge cases, sub-optimality.

I'll paste my solution to **Designing Payment System**. Review it for correctness, missed edge cases, and time/space complexity, then coach me toward the optimal — don't just rewrite it. Ask me to paste my code now. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🔁 Drill the pattern

Lock in recognition with look-alikes.

Give me 2 problems that use the SAME underlying pattern as **Designing Payment System**. For each, let me attempt first, then review my answer and name the trigger signal that reveals the pattern. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

📝 My notes