Home › Hands-On Builds › Backend Primitives

Build a Rate Limiter

Rate limiting is the first line of defence for any backend: it protects a service from overload, abuse, and runaway clients, and it's asked about in almost every system-design interview. You already know the theory — this project makes you build a production-shaped one from an empty file, in Go and Java, and prove it works under concurrency. That gap — from "I can explain a token bucket" to "I've shipped one and debugged its race" — is exactly what separates a strong candidate from the engineer a top team hires.

Play with it first

Before you write code, build intuition. The simulator below runs all four classic algorithms against live traffic — drag the limit, burst capacity, and incoming rate, switch the traffic pattern to Bursty, and watch which algorithm absorbs the spike and which throttles hard. Notice how Token Bucket lets a burst through up to its capacity while Leaky Bucket smooths output to a flat rate.

The spec

Build a limiter with this contract, then grow it in milestones:

Core: allow() → bool that permits at most R requests per second on average, while absorbing a burst of up to C.
Thread-safe: correct under many concurrent callers (this is where the real bugs live).
Per-key: independent limits per client/API-key/IP (multi-tenant).
Cheap: O(1) time and memory per check — no background threads, no timers. Use lazy refill: compute tokens from elapsed time on each call.

Milestones

M1 — Single-key token bucket. Lazy refill, allow() returns true/false. Get the math right first, single-threaded.
M2 — Make it thread-safe. One lock around the read-modify-write of tokens. Prove it with a concurrent test.
M3 — Per-key. A concurrent map of key → bucket, created on first use. Mind the map's own concurrency.
M4 — Distributed. Move the state to Redis with an atomic Lua script so N app servers share one limit.

Reference implementation — Token Bucket (Go)

Lazy refill is the whole trick: instead of a timer topping up tokens, we compute how many would have been added since the last call. O(1), no goroutines.

type TokenBucket struct {
    mu         sync.Mutex
    capacity   float64   // max burst
    tokens     float64   // current tokens
    refillRate float64   // tokens added per second (the steady limit)
    last       time.Time
}

func NewTokenBucket(ratePerSec, capacity float64) *TokenBucket {
    return &TokenBucket{
        capacity: capacity, tokens: capacity,
        refillRate: ratePerSec, last: time.Now(),
    }
}

// Allow reports whether one request may proceed now.
func (tb *TokenBucket) Allow() bool {
    tb.mu.Lock()
    defer tb.mu.Unlock()

    now := time.Now()
    elapsed := now.Sub(tb.last).Seconds()
    tb.tokens = math.Min(tb.capacity, tb.tokens+elapsed*tb.refillRate) // lazy refill
    tb.last = now

    if tb.tokens >= 1 {
        tb.tokens--
        return true
    }
    return false
}

Per-key (M3): a sharded/So concurrent map, buckets created on demand.

type Limiter struct {
    mu       sync.Mutex
    buckets  map[string]*TokenBucket
    rate, cap float64
}

func (l *Limiter) Allow(key string) bool {
    l.mu.Lock()
    b, ok := l.buckets[key]
    if !ok {
        b = NewTokenBucket(l.rate, l.cap)
        l.buckets[key] = b
    }
    l.mu.Unlock()      // release the map lock before the per-bucket lock
    return b.Allow()
}

Reference implementation — Token Bucket (Java)

public final class TokenBucket {
    private final double capacity;
    private final double refillPerSec;
    private double tokens;
    private long lastNanos;

    public TokenBucket(double ratePerSec, double capacity) {
        this.capacity = capacity;
        this.refillPerSec = ratePerSec;
        this.tokens = capacity;
        this.lastNanos = System.nanoTime();
    }

    public synchronized boolean allow() {
        long now = System.nanoTime();
        double elapsed = (now - lastNanos) / 1_000_000_000.0;
        tokens = Math.min(capacity, tokens + elapsed * refillPerSec); // lazy refill
        lastNanos = now;
        if (tokens >= 1.0) { tokens -= 1.0; return true; }
        return false;
    }
}

// Per-key (M3): computeIfAbsent gives atomic create-on-first-use.
public final class Limiter {
    private final Map<String, TokenBucket> buckets = new ConcurrentHashMap<>();
    private final double rate, cap;
    public Limiter(double rate, double cap) { this.rate = rate; this.cap = cap; }
    public boolean allow(String key) {
        return buckets.computeIfAbsent(key, k -> new TokenBucket(rate, cap)).allow();
    }
}

Tests to write (this is the real learning)

Burst then throttle: with rate=5, capacity=10, fire 10 immediately → all allowed; the 11th → rejected. This proves the bucket absorbs a burst of exactly capacity.
Refill: after exhausting tokens, sleep 1s → ~5 more allowed. Proves the average rate.
Concurrency: launch 100 goroutines/threads hammering allow(); assert the allowed count never exceeds capacity + rate·duration. Run with -race (Go) / a stress loop (Java) — remove the lock and watch this test fail. That failure is the lesson.
Monotonic clock: confirm you use a monotonic source (time.Now in Go is monotonic; System.nanoTime() in Java) — wall-clock currentTimeMillis can jump backward on NTP sync and corrupt the refill.

Extensions — take it further

Sliding-window counter for a smoother limit without the token-bucket burst allowance (see the simulator's Sliding Window mode).
Distributed (M4): a Redis EVAL Lua script that does the read-refill-write atomically on the server, so N app instances share one limit. Watch out for clock source (use Redis TIME) and the added network round-trip.
GCRA (Generic Cell Rate Algorithm): the token bucket reformulated as a single "theoretical arrival time" timestamp — O(1) state, no stored token count. It's what many production limiters (and Redis-cell) actually use.
Return headers: emit X-RateLimit-Remaining and Retry-After so clients can back off politely.

What you've mastered when this is done

You can implement and defend all four limiter algorithms and say when each wins (burst tolerance vs smoothing vs memory vs boundary bug).
You've written — and broken — a concurrency test, so you understand the race a rate limiter's shared counter creates.
You can extend a single-node limiter to a distributed one and name the new failure modes (clock skew, the extra round-trip, atomicity).

🤖 Don't fully get this? Learn it with Claude

Stuck on Build a Rate Limiter? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.

🎨 Explain it visually

Build the mental picture, not memorization.

I just read a lesson on **Build a Rate Limiter** (Hands-On Builds) and want to truly understand it. Explain Build a Rate Limiter from first principles using ONE vivid real-world analogy and a visual mental model — draw it as ASCII art or a clear step-by-step diagram — with a concrete example using real numbers. Then ask me one question to check I got the mental picture, and wait for my reply. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🤔 Walk me through it (interactive)

Socratic — adapts to where you're stuck.

Teach me **Build a Rate Limiter** interactively. Ask me ONE guiding question at a time, wait for my answer, and adapt to my confusion — build the idea with me step by step instead of explaining it all at once. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🧪 Quiz me & fix my gaps

Active recall exposes what you missed.

Quiz me on **Build a Rate Limiter** with 5 questions, easy to tricky, ONE at a time. Tell me if each answer is right; at the end, explain clearly what I got wrong and why. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🧠 Make it stick

Intuition + hook + flashcards for long-term memory.

Help me remember **Build a Rate Limiter** for the long term: give the one-sentence intuition, a memorable hook/mnemonic, a tiny worked example, and 3 active-recall flashcards (Q -> A). If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

Build a Rate Limiter