Build a Rate Limiter
Build a Rate Limiter
Rate limiting is the first line of defence for any backend: it protects a service from overload, abuse, and runaway clients, and it's asked about in almost every system-design interview. You already know the theory — this project makes you build a production-shaped one from an empty file, in Go and Java, and prove it works under concurrency. That gap — from "I can explain a token bucket" to "I've shipped one and debugged its race" — is exactly what separates a strong candidate from the engineer a top team hires.
Play with it first
Before you write code, build intuition. The simulator below runs all four classic algorithms against live traffic — drag the limit, burst capacity, and incoming rate, switch the traffic pattern to Bursty, and watch which algorithm absorbs the spike and which throttles hard. Notice how Token Bucket lets a burst through up to its capacity while Leaky Bucket smooths output to a flat rate.
The spec
Build a limiter with this contract, then grow it in milestones:
- Core:
allow() → boolthat permits at most R requests per second on average, while absorbing a burst of up to C. - Thread-safe: correct under many concurrent callers (this is where the real bugs live).
- Per-key: independent limits per client/API-key/IP (multi-tenant).
- Cheap: O(1) time and memory per check — no background threads, no timers. Use lazy refill: compute tokens from elapsed time on each call.
Milestones
- M1 — Single-key token bucket. Lazy refill,
allow()returns true/false. Get the math right first, single-threaded. - M2 — Make it thread-safe. One lock around the read-modify-write of
tokens. Prove it with a concurrent test. - M3 — Per-key. A concurrent map of key → bucket, created on first use. Mind the map's own concurrency.
- M4 — Distributed. Move the state to Redis with an atomic Lua script so N app servers share one limit.
Reference implementation — Token Bucket (Go)
Lazy refill is the whole trick: instead of a timer topping up tokens, we compute how many would have been added since the last call. O(1), no goroutines.
type TokenBucket struct {
mu sync.Mutex
capacity float64 // max burst
tokens float64 // current tokens
refillRate float64 // tokens added per second (the steady limit)
last time.Time
}
func NewTokenBucket(ratePerSec, capacity float64) *TokenBucket {
return &TokenBucket{
capacity: capacity, tokens: capacity,
refillRate: ratePerSec, last: time.Now(),
}
}
// Allow reports whether one request may proceed now.
func (tb *TokenBucket) Allow() bool {
tb.mu.Lock()
defer tb.mu.Unlock()
now := time.Now()
elapsed := now.Sub(tb.last).Seconds()
tb.tokens = math.Min(tb.capacity, tb.tokens+elapsed*tb.refillRate) // lazy refill
tb.last = now
if tb.tokens >= 1 {
tb.tokens--
return true
}
return false
}
Per-key (M3): a sharded/So concurrent map, buckets created on demand.
type Limiter struct {
mu sync.Mutex
buckets map[string]*TokenBucket
rate, cap float64
}
func (l *Limiter) Allow(key string) bool {
l.mu.Lock()
b, ok := l.buckets[key]
if !ok {
b = NewTokenBucket(l.rate, l.cap)
l.buckets[key] = b
}
l.mu.Unlock() // release the map lock before the per-bucket lock
return b.Allow()
}
Reference implementation — Token Bucket (Java)
public final class TokenBucket {
private final double capacity;
private final double refillPerSec;
private double tokens;
private long lastNanos;
public TokenBucket(double ratePerSec, double capacity) {
this.capacity = capacity;
this.refillPerSec = ratePerSec;
this.tokens = capacity;
this.lastNanos = System.nanoTime();
}
public synchronized boolean allow() {
long now = System.nanoTime();
double elapsed = (now - lastNanos) / 1_000_000_000.0;
tokens = Math.min(capacity, tokens + elapsed * refillPerSec); // lazy refill
lastNanos = now;
if (tokens >= 1.0) { tokens -= 1.0; return true; }
return false;
}
}
// Per-key (M3): computeIfAbsent gives atomic create-on-first-use.
public final class Limiter {
private final Map<String, TokenBucket> buckets = new ConcurrentHashMap<>();
private final double rate, cap;
public Limiter(double rate, double cap) { this.rate = rate; this.cap = cap; }
public boolean allow(String key) {
return buckets.computeIfAbsent(key, k -> new TokenBucket(rate, cap)).allow();
}
}
Tests to write (this is the real learning)
- Burst then throttle: with rate=5, capacity=10, fire 10 immediately → all allowed; the 11th → rejected. This proves the bucket absorbs a burst of exactly
capacity. - Refill: after exhausting tokens, sleep 1s → ~5 more allowed. Proves the average rate.
- Concurrency: launch 100 goroutines/threads hammering
allow(); assert the allowed count never exceedscapacity + rate·duration. Run with-race(Go) / a stress loop (Java) — remove the lock and watch this test fail. That failure is the lesson. - Monotonic clock: confirm you use a monotonic source (
time.Nowin Go is monotonic;System.nanoTime()in Java) — wall-clockcurrentTimeMilliscan jump backward on NTP sync and corrupt the refill.
Extensions — take it further
- Sliding-window counter for a smoother limit without the token-bucket burst allowance (see the simulator's Sliding Window mode).
- Distributed (M4): a Redis
EVALLua script that does the read-refill-write atomically on the server, so N app instances share one limit. Watch out for clock source (use RedisTIME) and the added network round-trip. - GCRA (Generic Cell Rate Algorithm): the token bucket reformulated as a single "theoretical arrival time" timestamp — O(1) state, no stored token count. It's what many production limiters (and Redis-cell) actually use.
- Return headers: emit
X-RateLimit-RemainingandRetry-Afterso clients can back off politely.
What you've mastered when this is done
- You can implement and defend all four limiter algorithms and say when each wins (burst tolerance vs smoothing vs memory vs boundary bug).
- You've written — and broken — a concurrency test, so you understand the race a rate limiter's shared counter creates.
- You can extend a single-node limiter to a distributed one and name the new failure modes (clock skew, the extra round-trip, atomicity).
🤖 Don't fully get this? Learn it with Claude
Stuck on Build a Rate Limiter? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.
Build the mental picture, not memorization.
I just read a lesson on **Build a Rate Limiter** (Hands-On Builds) and want to truly understand it. Explain Build a Rate Limiter from first principles using ONE vivid real-world analogy and a visual mental model — draw it as ASCII art or a clear step-by-step diagram — with a concrete example using real numbers. Then ask me one question to check I got the mental picture, and wait for my reply. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
Socratic — adapts to where you're stuck.
Teach me **Build a Rate Limiter** interactively. Ask me ONE guiding question at a time, wait for my answer, and adapt to my confusion — build the idea with me step by step instead of explaining it all at once. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
Active recall exposes what you missed.
Quiz me on **Build a Rate Limiter** with 5 questions, easy to tricky, ONE at a time. Tell me if each answer is right; at the end, explain clearly what I got wrong and why. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
Intuition + hook + flashcards for long-term memory.
Help me remember **Build a Rate Limiter** for the long term: give the one-sentence intuition, a memorable hook/mnemonic, a tiny worked example, and 3 active-recall flashcards (Q -> A). If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.