CPU & Server-Count Playbook
The two formulas nobody teaches
This is the dimension almost every guide skips. Two equivalent lenses:
Little’s Law: concurrencyN = λ × W(arrival rate × latency).
CPU cores:cores = (QPS × latency_ms) ÷ (1000 × target_utilization)
Worked example
10,000 QPS, each request burns ~10 ms of CPU, target 70% utilization:
- Work = 10,000 × 10 ms = 100,000 ms of CPU work per second = 100 core-seconds/sec (100 cores fully busy).
- ÷ 0.7 utilization ⇒ ~143 cores ⇒ ~18 eight-core servers (+ headroom).
- Cross-check with per-box anchors: 10K QPS ÷ ~1–5K QPS/server ≈ 2–10 servers if requests are cheap; the CPU math catches that 10 ms requests are not cheap.
CPU-bound vs IO-bound — this decides HOW you scale
| CPU-bound | IO-bound | |
|---|---|---|
| Bottleneck | Computation (encoding, ML, compression) | Waiting on DB / network / disk |
| Concurrency vs cores | N ≈ cores | N ≫ cores (threads mostly wait) |
| Scale by | Add cores / machines | Add concurrency (threads/async) + fix the slow dependency |
The kitchen analogy (why 70%, not 100%)
You hire 30 chefs for a 20-dish peak so a sudden rush doesn’t halt the kitchen. Run servers at ~70% so latency stays sane under bursts — queues explode as utilization approaches 100%.
Formulas are standard/public-domain engineering math. Approach and reference-table format adapted from the System Design Primer (CC BY 4.0), Jeff Dean’s latency numbers, the DesignGurus capacity-estimation guide, and Little’s Law.
🤖 Don't fully get this? Learn it with Claude
Stuck on CPU & Server-Count Playbook? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.
Build the mental picture, not memorization.
I just read a lesson on **CPU & Server-Count Playbook** (System Design) and want to truly understand it. Explain CPU & Server-Count Playbook from first principles using ONE vivid real-world analogy and a visual mental model — draw it as ASCII art or a clear step-by-step diagram — with a concrete example using real numbers. Then ask me one question to check I got the mental picture, and wait for my reply. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
Socratic — adapts to where you're stuck.
Teach me **CPU & Server-Count Playbook** interactively. Ask me ONE guiding question at a time, wait for my answer, and adapt to my confusion — build the idea with me step by step instead of explaining it all at once. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
Active recall exposes what you missed.
Quiz me on **CPU & Server-Count Playbook** with 5 questions, easy to tricky, ONE at a time. Tell me if each answer is right; at the end, explain clearly what I got wrong and why. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
Intuition + hook + flashcards for long-term memory.
Help me remember **CPU & Server-Count Playbook** for the long term: give the one-sentence intuition, a memorable hook/mnemonic, a tiny worked example, and 3 active-recall flashcards (Q -> A). If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.