Home › System Design › Capacity Estimation

CPU & Server-Count Playbook

The two formulas nobody teaches

This is the dimension almost every guide skips. Two equivalent lenses:

Little’s Law: concurrency N = λ × W (arrival rate × latency).
CPU cores: cores = (QPS × latency_ms) ÷ (1000 × target_utilization)

Worked example

10,000 QPS, each request burns ~10 ms of CPU, target 70% utilization:

Work = 10,000 × 10 ms = 100,000 ms of CPU work per second = 100 core-seconds/sec (100 cores fully busy).
÷ 0.7 utilization ⇒ ~143 cores ⇒ ~18 eight-core servers (+ headroom).
Cross-check with per-box anchors: 10K QPS ÷ ~1–5K QPS/server ≈ 2–10 servers if requests are cheap; the CPU math catches that 10 ms requests are not cheap.

CPU-bound vs IO-bound — this decides HOW you scale

	CPU-bound	IO-bound
Bottleneck	Computation (encoding, ML, compression)	Waiting on DB / network / disk
Concurrency vs cores	N ≈ cores	N ≫ cores (threads mostly wait)
Scale by	Add cores / machines	Add concurrency (threads/async) + fix the slow dependency

The kitchen analogy (why 70%, not 100%)

You hire 30 chefs for a 20-dish peak so a sudden rush doesn’t halt the kitchen. Run servers at ~70% so latency stays sane under bursts — queues explode as utilization approaches 100%.

Formulas are standard/public-domain engineering math. Approach and reference-table format adapted from the System Design Primer (CC BY 4.0), Jeff Dean’s latency numbers, the DesignGurus capacity-estimation guide, and Little’s Law.

🤖 Don't fully get this? Learn it with Claude

Stuck on CPU & Server-Count Playbook? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.

🎨 Explain it visually

Build the mental picture, not memorization.

I just read a lesson on **CPU & Server-Count Playbook** (System Design) and want to truly understand it. Explain CPU & Server-Count Playbook from first principles using ONE vivid real-world analogy and a visual mental model — draw it as ASCII art or a clear step-by-step diagram — with a concrete example using real numbers. Then ask me one question to check I got the mental picture, and wait for my reply. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🤔 Walk me through it (interactive)

Socratic — adapts to where you're stuck.

Teach me **CPU & Server-Count Playbook** interactively. Ask me ONE guiding question at a time, wait for my answer, and adapt to my confusion — build the idea with me step by step instead of explaining it all at once. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🧪 Quiz me & fix my gaps

Active recall exposes what you missed.

Quiz me on **CPU & Server-Count Playbook** with 5 questions, easy to tricky, ONE at a time. Tell me if each answer is right; at the end, explain clearly what I got wrong and why. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🧠 Make it stick

Intuition + hook + flashcards for long-term memory.

Help me remember **CPU & Server-Count Playbook** for the long term: give the one-sentence intuition, a memorable hook/mnemonic, a tiny worked example, and 3 active-recall flashcards (Q -> A). If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

📝 My notes

← QPS & Throughput Playbook Memory & Cache-Sizing Playbook →