CDN

Step 8 in the System Design path · 4 concepts · 0 problems

0 / 4 complete

📘 Learn CDN from zero

A CDN (Content Delivery Network) is a geographically distributed fleet of caching servers ("edge" or PoP servers) that sit between your users and your origin server, serving copies of your content from a location physically close to each user. In an interview, "put a CDN in front of it" is the standard first move for any read-heavy system serving static or cacheable content (images, video, JS/CSS, even API responses) — it cuts latency, slashes origin load, and absorbs traffic spikes. Knowing the mechanism, the push-vs-pull trade-off, and the staleness/invalidation cost is what separates a memorized answer from a real one.

✨ Added by the guide to build intuition — not from the source course.

Lessons in this topic

🏗️ Apply it — design walkthrough

Work through this after you've learned the concepts in the lessons above.

Why distance hurts

🤔 A user in Sydney requests a 2 MB image from your single server in Virginia (USA). Even if the server responds in 0 ms of processing time, the user still waits roughly a second. Why?

Reveal the reasoning

Light in fiber travels ~200,000 km/s (about 2/3 of light in a vacuum), and Sydney→Virginia is ~16,000 km one way. So one round trip ≈ 160 ms just from physics — and a fresh HTTPS connection needs several round trips before the first byte of data even moves:

TCP handshake: 1 RTT
TLS 1.2 handshake: ~2 RTT (TLS 1.3 cuts this to 1 RTT — worth naming in an interview)
HTTP request + first response byte: 1 RTT

That's ~4 RTT × 160 ms ≈ 640 ms before payload flows, then several more RTTs to actually ship 2 MB (TCP slow-start ramps the window up gradually). Cause → effect: large physical distance → high RTT → multiplied by every handshake and every data round trip → ~1s perceived latency, no matter how fast the server's CPU is. The cost/insight: you cannot fix this with a bigger or faster server — the bottleneck is the speed of light over distance, which is exactly the problem a CDN exists to solve.

Edge vs origin

🤔 We put a CDN edge server in Sydney, ~50 km from the user, holding a copy of that image. The origin in Virginia is unchanged. What is the role of each server now, and roughly what latency does the user see?

Reveal the reasoning

Two distinct roles emerge:

Origin server = the single source of truth. It holds the real, authoritative content and is where writes/updates happen. There are few of these.
Edge server (PoP) = a read-only cache copy placed near users. A real CDN has hundreds of PoPs worldwide.

Cause → effect: edge is ~50 km away → propagation RTT drops from 160 ms to ~1 ms → the handshake + request round trips now cost a few ms instead of ~640 ms. The user perceives the image loading near-instantly (a few ms of network plus the time to stream the bytes over a short, fast last mile). The cost: you now maintain copies in many places, so an edge copy can be stale relative to origin — you've traded perfect freshness for speed. Managing that staleness is the rest of this lesson.

How a cache miss works

🤔 The Sydney edge is brand new and empty. The first user requests /logo.png, which the edge has never seen. Walk through what physically happens — and why is the SECOND Sydney user so much faster?

Reveal the reasoning

This is the pull / cache-miss path, step by step:

User 1 hits Sydney edge → edge checks its store → MISS.
Edge forwards the request to the Virginia origin (~160 ms RTT, the slow path).
Origin returns logo.png + caching headers (e.g. Cache-Control: max-age=86400).
Edge stores a copy, then returns it to User 1. User 1 paid the full ~1s — no faster than having no CDN.
User 2 requests /logo.png → edge checks store → HIT → served locally in a few ms.

Cause → effect: the first request warms the cache (slow) → every subsequent request within the TTL is a local hit (fast). If 100,000 Sydney users request the logo, ~1 paid the origin round trip and the rest got the fast local hit. The cost: the first user (and the first user after every expiry) eats the full latency — the "cold cache" penalty. (When TTL expires, well-behaved CDNs revalidate with a conditional request — If-None-Match/304 Not Modified — so the refresh is cheap if the object hasn't changed.)

Origin load reduction

🤔 You serve 1,000,000 requests/sec for static assets, and your CDN achieves a 95% cache hit rate. How many requests/sec actually reach your origin — and why does this matter more than latency for some teams?

Reveal the reasoning

Hit rate is the fraction of requests served by the edge without touching origin.

Edge serves: 95% × 1,000,000 = 950,000 req/s
Origin sees only the misses: 5% × 1,000,000 = 50,000 req/s

Cause → effect: 95% hit rate → origin request load drops 20× → you can run far fewer/cheaper origin servers and survive traffic spikes. If a product goes viral and traffic 10×'s, the edge absorbs the surge; origin barely notices as long as the hit rate holds. This "shield" function is separate from the latency win. The cost: hit rate is everything, and it is non-linear — going from 95% to 99% hit rate cuts origin load by another 5× (5% → 1%). Conversely, content that is highly personalized or constantly changing has a low hit rate, so a CDN gives almost none of this benefit and just adds an extra network hop. CDNs pay off for content that is the same for many users.

Push vs Pull CDN

🤔 You're choosing how content gets onto the edges. Option A: edges fetch on first request (lazy). Option B: you upload everything to all edges up front. When does each win — and what breaks if you pick wrong?

Reveal the reasoning

Pull CDN (Option A — the common default):

Mechanism: edge pulls from origin on a miss, then caches (the step-3 flow). You just set caching headers; the CDN decides what to keep based on demand and evicts cold objects (typically LRU) when storage fills.
Best for: large catalogs where only a fraction is hot (e.g. millions of product images, but ~90% of traffic hits ~10% of them).
Cost: the first request per asset per edge is slow (cold-cache penalty), and you have less direct control over exactly what's cached where.

Push CDN (Option B):

Mechanism: you proactively upload/replicate content to edges before users ask.
Best for: a small set of large, high-value files you KNOW will be hot — e.g. a new game patch or a launch-day video. No cold-cache penalty, ever.
Cost: you replicate everything to edges whether or not it's requested (wasteful storage if much is never used), and you own the work of pushing updates and re-pushing on every change.

Cause → effect: unpredictable/long-tail traffic → Pull (let demand decide what to cache); predictable hot launches → Push (pre-warm to kill the cold start). Pick Push for a huge sparse catalog and you waste storage replicating cold objects; pick Pull for a synchronized global launch and the first wave of users in every region eats the slow origin round trip. (Note: many modern CDNs blur this — "pull" CDNs often expose a pre-warm/prefetch API to get the best of both.)

The staleness problem

🤔 You set Cache-Control: max-age=86400 (1 day) on /price.json, then you change the price at origin. A user in Tokyo loads the page. What price do they see, and for how long?

Reveal the reasoning

When the edge cached the object, it recorded a freshness promise: "this is fresh for 86,400 seconds from when I fetched it." It will keep serving that copy until its own timer expires, regardless of what origin now says — the edge has no idea the price changed.

Worst case: a Tokyo edge fetched the price 1 second before your change → it serves the old price for almost the full 86,400 s (up to ~24h) until its TTL lapses.
Each edge started its timer when it first fetched, so edges expire at different wall-clock times → users in different cities can see different prices simultaneously.

Cause → effect: long TTL → great hit rate + low origin load → BUT data can be stale for up to one full TTL. This is the core CDN trade-off: freshness and hit rate are in direct tension. Short TTL (e.g. 60 s) → fresher data but more origin traffic and a lower hit rate; long TTL → fewer origin hits but staler data. The cost: there is no free lunch — you tune TTL per content type (immutable logo = 1 year, price = ~60 s, breaking news = no-cache, i.e. revalidate every time).

Forcing freshness fast

🤔 You shipped a bug in /app.js and it's cached with a 1-year TTL across 300 edges. You can't wait for TTL to expire. What are your two real options to get the fix to every user in minutes?

Reveal the reasoning

Option 1 — Cache invalidation / purge: tell the CDN "evict /app.js from all edges now." The next request at each edge becomes a miss → pulls the fixed file from origin.

Cause → effect: purge → forced misses across all edges → a brief origin load spike as edges refetch. Purges also aren't instant — they propagate across the network over seconds to a couple of minutes.

Option 2 — Cache busting via versioned URLs (the preferred fix): never overwrite; change the path. Ship /app.v2.js (or /app.js?v=hash) and update the HTML to reference it.

Cause → effect: new URL → guaranteed miss on a path nothing has ever cached → instantly fresh, no purge needed. The old app.js just ages out of caches harmlessly.
This is why build tools fingerprint assets (main.a1b2c3.js): the content hash in the name lets you set max-age=1 year, immutable on the asset AND deploy instantly — best of both worlds. (Caveat: the HTML/manifest that references these files must itself be short-TTL or revalidated, or users keep loading the old references.)

The cost/insight: purge is reactive and risks an origin thundering herd; versioned URLs are proactive and sidestep the problem entirely, but require your build/deploy pipeline to generate the fingerprinted names and rewrite every reference. Interview takeaway: prefer immutable, versioned assets so you rarely need to purge at all.

Cache stampede

🤔 A celebrity video is the hottest object on your CDN — 50,000 req/s hitting one edge, all served from cache. Its TTL expires at exactly 12:00:00. What happens at 12:00:00, and how do you prevent the disaster?

Reveal the reasoning

The instant the TTL lapses, the cached copy is no longer fresh. The naive edge behavior:

At 12:00:00 the next request is a MISS → edge goes to origin to refetch.
But the refetch takes ~160 ms, and during that window ~50,000 req/s keep arriving and also see a MISS.
If each miss independently fires its own origin fetch, the edge sends thousands of simultaneous identical requests to origin for one object — the cache stampede / thundering herd. Origin can buckle, and ironically the failure happens precisely on your most popular content.

Cause → effect: synchronized expiry of a hot object → a burst of concurrent misses → a flood of duplicate origin fetches → origin overload. The standard defenses:

Request coalescing / collapsing (the primary fix): the edge lets the first miss go to origin and makes all other concurrent requests for the same key wait for that single in-flight fetch, then serves them all from the one response. 50,000 waiting requests → 1 origin request.
Stale-while-revalidate: keep serving the slightly-stale copy to users while one background request refreshes it. Users never block on origin; freshness lags by one fetch.
TTL jitter: add a small random offset to TTLs so many objects (and many edges) don't all expire at the same instant, smoothing origin load over time.

The cost/insight: these protect origin but each trades something — coalescing adds a tiny wait on the very first miss; stale-while-revalidate deliberately serves stale data for one refresh cycle; jitter means you can't reason about an exact expiry time. The deep takeaway: at scale, the danger isn't the steady miss rate — it's correlated misses all hitting origin at once, and the fix is to ensure only one fetch per key reaches origin.

📐 Architecture diagrams (1)

🎯 Guided practice

Problem 1 (easy): Should this use a CDN, and what's the hit/miss flow?

Scenario: a photo-sharing app serves the same uploaded images to viewers worldwide. Ask the recognition question — is the content static and is read >> write? An image, once uploaded, is read many times and rarely changes, so yes.
Are users global? Yes. A single origin means distant users pay a fixed round-trip latency tax on every fetch, so reach for a CDN.
Trace the flow (Xu's workflow): the first viewer in a region requests the image via a CDN domain → cache miss → the edge fetches from origin once and caches it with the TTL returned on the origin response. Subsequent viewers in that region → cache hit → served from the nearby edge in single-digit to low-tens of ms.
Pattern learned: CDN value scales with (read frequency) × (geographic spread) × (object staticness). High on all three is a strong fit; the offloaded reads also shield the origin.

Problem 2 (medium): Pull vs Push for a viral video site with a small team.

Characterize the content: video files are large, the catalog is big, and you can't predict which clips go viral — so you don't know in advance what to pre-place.
Evaluate Push: you'd upload every video to the CDN ahead of time. That replicates huge volumes of data, much of which may never be requested in a given region — wasteful storage and heavy operational work for a small team. Push fits low-traffic sites or content that rarely changes, which is the opposite of this.
Evaluate Pull: edges fetch a video on first request in a region and cache it for the TTL. Only actually-popular videos get replicated, automatically following demand. Cost: the first viewer per region per video pays the cache-miss latency, and a too-short TTL can cause repeated re-pulls from origin.
Decision: choose Pull CDN — a large, unpredictable, frequently-changing catalog with limited ops favors lazy, demand-driven caching. (Flip it if content were small, stable, and traffic low — then Push avoids the first-hit penalty and gives explicit control over placement and expiry.)
Pattern learned: Pull = lazy, demand-driven, low effort, first-hit penalty, minimal wasted storage. Push = eager, you control placement/expiry, no first-hit penalty, but wasted storage if you guess wrong. Match the strategy to demand predictability and operational capacity.

✨ Added by the guide — work these before the full problem set.