Quorum
Step 14 in the System Design path · 1 concepts · 0 problems
📘 Learn Quorum from zero
What it is. A quorum is the minimum number of replica votes an operation must collect to count as done. In a leaderless store that copies each datum onto N nodes, you require a write to be acknowledged by W nodes and a read to gather R nodes. Choosing W and R so their sets are forced to overlap (W + R > N) lets the system stay available during failures while bounding how stale a read can be.
The problem it solves. If a write must wait for all N copies, one slow or down node stalls every write. If you write to just one and read from just one, a reader can easily hit a stale copy. Quorums let you dial the overlap so reads still observe recent writes without demanding unanimity.
Analogy. Five friends share a fact that changes. Rule: to update it you must personally tell at least 3 of them (W=3); to look it up you must ask at least 3 and take the newest answer (R=3). Any group of 3 and any other group of 3 out of 5 must share at least one person (3+3=6 > 5), so whoever you ask includes someone who heard the latest update.
Worked example. N=3 replicas A, B, C; W=2, R=2 (W+R=4>3). You write x=10 (version 5); it lands on A and B and acks. C was briefly down and still holds x=9 (version 4). Later you read with R=2 and ask B and C, getting {v5:10, v4:9}. The overlap guarantees at least one node in any 2-node read set has v5 — here, B. You compare versions, return 10, and read-repair pushes v5 to C. Correct, despite C being stale.
Key insight: W + R > N forces every read set to share a node with every write set, so a read observes the latest completed write — but only versioning lets you recognize it, and only consensus gives true ordering.
✨ Added by the guide to build intuition — not from the source course.
Lessons in this topic
🎯 Guided practice
- Easy — Tune the knobs. A system has
N=5replicas. The product wants reads to never be stale (bounded by the quorum overlap) but writes are far more frequent than reads, so writes should be cheap. ChooseWandR.Reasoning: The overlap condition is
W + R > N, i.e.W + R ≥ 6. To make writes cheap, minimizeW. The smallest validWisW=1, which forcesR ≥ 5, soR=5. Answer:W=1, R=5(fast writes, full-replica reads). If reads were the hot path instead, flip toW=5, R=1. The pattern: push the cost onto the rarer operation while preserving the overlap inequality. Caveat worth voicing: withW=1a write that acks from one node, then loses that node before it replicates, is lost — so very lowWtrades durability for write latency. - Medium — Failure tolerance under a majority quorum.
N=5, balanced majorityW=R=3(W+R=6>5). (a) How many node failures can writes tolerate? (b) Two clients concurrently writex=Ato nodes {1,2,3} andx=Bto nodes {3,4,5} — both reach quorum. What happens, and how do you resolve it?Reasoning (a): A write needs
W=3reachable replicas, so it survives up toN − W = 2failures; reads likewise tolerateN − R = 2. Majority quorum (⌊N/2⌋+1) maximizes failure tolerance while keepingW+R>N, which is why it is the standard default and the size consensus protocols require to commit.Reasoning (b): Both writes succeed because each reached 3 nodes — quorum bounds staleness, it does not serialize or prevent conflicting concurrent writes. Node 3 receives both, with no happens-before relation between them. The fix lives at the versioning layer, not the quorum layer: attach vector clocks (the approach Amazon's Dynamo paper introduced; covered in DDIA ch. 5). A later read sees A and B as causally concurrent (neither dominates), so the system must resolve the conflict — either last-write-wins by timestamp (lossy: silently discards one update) or by returning both siblings for application-level merge (e.g. unioning two shopping carts). The lesson: quorum guarantees you'll see the latest completed writes; versioning is what lets you order or merge them.
✨ Added by the guide — work these before the full problem set.