Messaging System
Step 19 in the System Design path · 1 concepts · 0 problems
📘 Learn Messaging System from zero
Start from the problem. Service A wants Service B to do something, but A should not wait for B, and B might be down or slow. If A calls B directly (synchronously), A blocks on B's latency, and A's request fails whenever B fails. They are tightly coupled. A messaging system breaks that coupling by putting a durable intermediary (a broker) in the middle.
Analogy — the restaurant ticket rail. A waiter (the producer) does not hand-deliver each order to a chef and stand there waiting. They clip the ticket to a rail (the queue/broker) and walk off to serve other tables. Cooks (the consumers) pull tickets when free. During a rush, tickets pile up on the rail (buffering) instead of overwhelming the kitchen. If one cook is out, another grabs the next ticket. Waiter and cooks never block on each other.
Worked example. An e-commerce checkout. Done synchronously, "place order" calls payment, inventory, shipping, and email in series; total latency is the sum of all four, and any one failure breaks checkout. Instead, the order service writes one OrderPlaced message and returns to the user almost immediately. Independent consumers each react: billing charges the card, inventory decrements stock, shipping creates a label, email sends a receipt. If the email service is down, its messages wait on the broker and are processed when it recovers — the order still succeeded, because the broker persisted the event.
Two core shapes exist: point-to-point queues (each message consumed once, by one worker in the pool) for distributing work, and pub/sub topics (each message delivered to every subscriber group) for broadcasting events.
Key insight: a messaging system trades a little latency and a duplicate-handling burden for decoupling, durability, and elasticity — producers and consumers can fail, scale, and deploy independently.
✨ Added by the guide to build intuition — not from the source course.
Lessons in this topic
🎯 Guided practice
Easy — pick the right shape. A photo-upload service must generate a thumbnail for every uploaded image. Uploads spike 10x at peak; thumbnailing is CPU-heavy and slow. Should you call the thumbnailer synchronously, and what messaging shape fits?
Step 1 — does the user need the result now? No. The user needs "upload accepted"; the thumbnail can appear seconds later. That signals async, not synchronous RPC.
Step 2 — who consumes each message? Exactly one worker should thumbnail each image; you don't want every worker reprocessing the same one. That is a point-to-point queue with competing consumers, not pub/sub.
Step 3 — handle the spike. The upload service enqueues an
ImageUploadedmessage and returns immediately. A pool of thumbnail workers pulls from the queue. At peak the queue absorbs the burst (load leveling); you scale workers out to drain the backlog.Step 4 — failures. Assume at-least-once delivery, so make the worker idempotent (skip if the thumbnail already exists, keyed by image ID). A repeatedly failing image goes to a DLQ after N retries so it never blocks the queue head.
Answer: async queue, competing consumers, idempotent workers, DLQ.
Medium — ordering and fan-out together. A ride-hailing app emits driver-location updates and trip events. Two teams consume them: a live-map service (wants every event, broadcast) and a billing service (must process a single trip's events in order: start → ... → end). Design the messaging topology.
Step 1 — fan-out need. Two independent teams want the same events, so use a pub/sub topic with separate consumer groups, not a shared single-consumer queue. Each group tracks its own offset and reads independently, so a slow live-map never holds back billing.
Step 2 — ordering need. Global ordering across all trips won't scale. But billing only needs ordering per trip. So partition by
trip_id: the partition key hashes each trip to one partition, and Kafka guarantees order within a partition. All of one trip's events land in the same partition, in send order.Step 3 — parallelism. Different trips hash to different partitions, so billing runs one consumer per partition in parallel — you get per-trip ordering and throughput. Note the constraint: the partition count caps consumer parallelism within a group (extra consumers sit idle).
Step 4 — the trap. Don't choose the key carelessly. Keying by
driver_idwould serialize all of one driver's events into a single partition — that doesn't give wrong ordering, but it (a) creates hot partitions for busy drivers and (b) needlessly couples unrelated trips, capping throughput. And never assume cross-partition order: the live map must treat events as independent, since two partitions are consumed concurrently. Choosing the partition key is the design decision.Answer: pub/sub topic, partition by
trip_id, separate consumer groups, per-partition ordering.
✨ Added by the guide — work these before the full problem set.