Home › Concurrency › Lock-Free & Memory Ordering

Safe Memory Reclamation (Hazard Pointers, Epochs, RCU)

The problem: you can't just `free()` a node someone might be reading

Lock-free code lets many threads touch a shared structure with no mutual exclusion, so a reader can hold a raw pointer to a node at the exact instant another thread unlinks and frees it — the reader then dereferences memory the allocator has already handed to something else. Safe Memory Reclamation (SMR) is the discipline that answers the one question a lock never has to ask: when is it provably safe to reclaim this memory, given no reader announced they were done?

With a mutex the answer is trivial — the node is unreachable and no one else can be inside the critical section. Lock-free readers take no lock, so "unlinked" does not imply "unreferenced." A thread can have loaded the pointer before the unlink and be stalled (preempted, page fault, migrated) for milliseconds while still holding it. Free the node and that stalled thread wakes into a use-after-free: a silent read of poisoned bytes, a crash, or — worse — a correctness bug when the freed slot has been recycled into an unrelated allocation.

Why this is not the same as ABA

People conflate two distinct hazards. ABA is a logical bug: a CAS on head succeeds because the pointer value returned to A (address 0x40) even though the stack changed A→B→A underneath it; the value matches but the meaning changed. ABA is fixed with tags/version counters or double-width CAS. Use-after-free is a memory-lifetime bug: the address 0x40 is not just logically stale, its backing memory has been returned to the allocator. Tagged pointers do not fix this — you still touch freed memory. SMR is specifically about lifetime, and in fact it also dissolves ABA as a side effect: if a node cannot be freed while any reader references it, its address cannot be recycled to reappear as a false "A."

Approach 1 — Hazard Pointers: readers announce what they hold

The mechanism is per-reader publication: before dereferencing a shared pointer, a thread writes that pointer into a single-writer, globally-readable hazard pointer slot, then re-validates that the pointer is still live. A reclaimer never frees a node directly — it collects retired nodes on a private list and, periodically, scans every thread's hazard slots; a retired node is freed only if it appears in no slot. The scan is the safety proof: if the node is hazarded, some reader has announced it and reclamation is deferred.

The subtlety is the ordering. Publishing the pointer and re-reading the source must be sequenced so the reclaimer either sees the reader's announcement or the reader sees the node already gone. A store-load fence (or seq_cst) closes the window between "I loaded the pointer" and "I published it."

// C++11. One single-writer hazard slot per thread.
std::atomic<Node*> head;
std::atomic<Node*> hp[MAX_THREADS];   // hp[tid] is written only by tid

Node* protect(int tid) {
    Node* p;
    do {
        p = head.load(std::memory_order_acquire);
        hp[tid].store(p, std::memory_order_seq_cst); // announce intent
    } while (p != head.load(std::memory_order_acquire)); // re-validate
    return p;   // safe: a reclaimer scanning hp[] will see p and skip it
}

void retire(int tid, Node* n) {           // called after unlinking n
    retired[tid].push_back(n);
    if (retired[tid].size() >= 2 * MAX_THREADS) scan(tid); // amortized
}

void scan(int tid) {
    std::unordered_set<Node*> hazarded;
    for (int i = 0; i < MAX_THREADS; i++)              // snapshot all slots
        if (Node* h = hp[i].load(std::memory_order_seq_cst)) hazarded.insert(h);
    std::vector<Node*> keep;
    for (Node* n : retired[tid])
        if (hazarded.count(n)) keep.push_back(n);       // still referenced
        else delete n;                                  // provably safe
    retired[tid].swap(keep);
}

The while (p != head.load()) re-check is the crux and the most-often-omitted line: it defends against the node being unlinked and freed between the load and the publish. If the pointer changed, the announcement referred to a stale node, so retry.

Hazard pointers, traced

Say a Treiber stack holds N1→N2→N3. Thread 0 calls protect(0): loads head=N1, stores hp[0]=N1, re-reads head=N1 — match, proceeds to read N1. Thread 1 now pops N1 (CAS head=N2) and calls retire(N1); N1 lands on thread 1's retired list but is not freed. When thread 1's list crosses its threshold it scans: it snapshots hp[0]=N1, sees N1 is hazarded, and keeps it. Only after thread 0 finishes and overwrites hp[0] (with the next node, or null) will a later scan free N1. The stalled reader is never chasing freed memory — the price is that N1's reclamation is deferred and one store+fence was added to every read.

Approach 2 — Epoch-Based Reclamation (EBR): coarse but nearly free reads

The mechanism trades per-object bookkeeping for a single monotonic global epoch counter. A reader entering a critical section reads the global epoch and pins itself to it (publishing "active in epoch e"); it clears that when it exits. Retired nodes are stamped with the current epoch and bucketed into per-epoch limbo lists. The reclaimer advances the global epoch only when every active thread has been observed at the current epoch, and it is then safe to free everything two epochs back — because no thread can still hold a reference from that old epoch.

The win is on the read side: pinning is a plain load of the global counter plus a store of a thread-local slot — no per-object hazard publication, and typically no CPU-level fence instruction is needed on the pin path itself if the store already carries release semantics (a plain relaxed store there would let the reclaimer observe the pin too late). The ordering that actually must hold is: the pin's store publishing "active at epoch e" needs release semantics so it is visible before any subsequent access inside the critical section, and the reclaimer's read of every thread's pinned epoch needs to happen-after that release — in practice this is a store-load pattern (release store on pin, later acquire/fence when the reclaimer scans pins to decide the epoch can advance). Implementations like crossbeam-epoch still pay for this: the pin operation issues a store with Release ordering and the epoch-advance logic uses a SeqCst fence when collecting, because without a real store-load barrier somewhere a reader's pin and the reclaimer's decision to advance can reorder on architectures with relaxed memory models (x86 gets this cheaply via its strong model; ARM/POWER do not). So EBR is cheaper than hazard pointers — one shared fence amortized across all objects touched in a critical section rather than one per object — but it is not fence-free; it just pays the ordering cost once per critical section instead of once per pointer.

EBR traced

Global epoch e = 10. Thread A enters: publishes local[A] = 10, active. Thread B retires node X → limbo[10].push(X).
Reclaimer wants to advance. It checks all active threads are at epoch 10. They are → bump global to e = 11.
New readers pin to 11. Any thread still pinned at 10 blocks freeing of limbo[10] — but limbo[9] is now safe: nobody can hold a reference two epochs old, so free limbo[9].
Once all active threads reach 11 and the epoch bumps to 12, limbo[10] (containing X) is reclaimed.

Approach 3 — RCU (Read-Copy-Update): the reader path is (almost) nothing

RCU is EBR's idea pushed to its cleanest extreme, designed so that on the read side there is zero atomic operation and zero write to shared memory. The mechanism: readers bracket their access in rcu_read_lock()/rcu_read_unlock(), which in the classic non-preemptible kernel build compile to nothing more than disabling preemption — no atomics, no fences on the common architectures, essentially free. Updaters follow copy-on-write: allocate a new version, initialize it, then flip a single pointer with a release store (rcu_assign_pointer) so new readers see the new version and in-flight readers keep the old one. The old version is freed only after a grace period — the moment when every CPU has passed through at least one quiescent state (a context switch, idle, or user-mode return), which proves no reader that could still hold the old pointer remains.

"Read, Copy, Update": readers read the current version cheaply; updaters copy and swap; reclamation waits for a grace period. The updater either blocks in synchronize_rcu() until the grace period elapses, or registers a callback with call_rcu() and moves on while a kernel thread frees the memory later.

Where the Linux kernel uses RCU — and why it's the default there

RCU was merged into Linux 2.5 (2002) and is now one of the kernel's most heavily-used synchronization primitives, precisely because kernel data structures are overwhelmingly read-mostly and readers must be fast and never block. Real users:

The dentry cache (dcache) — path-name lookup (rcu-walk) traverses directory entries with RCU readers, so millions of open()/stat() calls walk paths without taking per-entry locks; the slow ref-walk path is the fallback.
Networking — routing tables, the FIB, netfilter rules, and socket lookups are read on every packet but updated rarely; RCU lets the packet-forwarding fast path run lock-free.
The VFS and module lists, IDR/radix trees, SELinux policy, and /proc iteration — anywhere a structure is read constantly and mutated occasionally.

The kernel exposes the full API — rcu_read_lock, rcu_dereference, rcu_assign_pointer, synchronize_rcu, call_rcu — plus variants (srcu for sleepable readers, rcu_bh, Tree RCU for scalability across many CPUs). Userspace gets the same via liburcu. This is the canonical citation: Paul McKenney's RCU work is the reference SMR implementation in production.

Pitfalls a working engineer actually hits

Omitting the hazard-pointer re-validation. Publishing hp[tid]=p without the subsequent while (p != head.load()) re-check reintroduces the exact race SMR was meant to kill — the node can be freed between load and publish. This is the single most common hazard-pointer bug.
Missing the store-load fence. Hazard-pointer correctness needs the publish (store) to be visible before the re-read (load); relaxed ordering lets the reclaimer's scan and the reader's announcement reorder, and a node gets freed under an active reader. Use seq_cst on the slot or an explicit fence.
Blocking or sleeping inside an EBR/RCU read section. A reader that never un-pins (blocks on I/O, sleeps, hits a long loop) stalls epoch advancement / the grace period forever — limbo lists grow until OOM. Classic RCU readers must not sleep; use SRCU if they must.
Forgetting rcu_dereference/rcu_assign_pointer. A plain load of the RCU-protected pointer lets the compiler or a weakly-ordered CPU (e.g. Arm/Power) speculate the dependent load ahead of the pointer read, so you read fields of the old object through the new pointer. The macros carry the required dependency/consume ordering.
Unbounded memory in EBR/RCU vs bounded in hazard pointers. If update rate spikes while a grace period drags, call_rcu callbacks queue up; the kernel has shipped OOM incidents from callback floods (hence rcu_barrier and callback throttling).

Trade-offs: choosing among Hazard Pointers, EBR, and RCU

All three answer the same question — when is it safe to free? — with a different unit of accounting: hazard pointers track which objects are referenced, EBR and RCU track which time interval a reader is active in. That difference in accounting is exactly why their cost and risk profiles diverge.

Dimension	Hazard Pointers	Epoch-Based Reclamation	RCU
Reader cost	One store + fence per protected pointer — scales with objects touched.	One release-store pin per critical section, amortized across every object touched inside it.	Near zero — disable preemption, no atomics, no shared-memory writes on common architectures.
Memory bound	Bounded: a stalled reader blocks freeing of only the O(H) nodes it hazards.	Unbounded: one stalled/un-pinned thread freezes the epoch and every subsequent retiree piles into limbo.	Unbounded: one CPU that never reaches a quiescent state stalls the grace period and every `call_rcu` callback since.
Writer/updater cost	Pays for the periodic scan over all hazard slots (O(R·H)); no blocking on readers.	Pays for advancing the epoch, which needs every active thread observed — can stall if a thread doesn't pin/unpin often.	`synchronize_rcu()` blocks the updater for a full grace period, or `call_rcu()` defers the free but still queues work.
Reader/writer relationship	Fully decoupled — a slow reader never blocks a writer's progress, only delays reclamation of the nodes it holds.	Decoupled in the same sense, but the blast radius of a stuck reader is global (all limbo lists), not per-object.	A blocking updater (`synchronize_rcu`) is effectively stalled by the slowest in-flight reader — the closest thing to a reader blocking a writer among the three.
Sleeping in read section	Fine — a hazard slot has no time dimension, only object identity.	Dangerous — sleeping while pinned freezes the epoch.	Forbidden in classic RCU (breaks quiescent-state detection); use SRCU if readers must block.
Implementation weight	Moderate — per-object bookkeeping, scan logic, no kernel/runtime support required.	Light — a global counter, per-thread slot, limbo lists; easy to retrofit into userspace (crossbeam-epoch).	Heavy — needs deep runtime/kernel integration to detect quiescent states cheaply; that machinery is what makes the reader path free.

When to pick which

Pick hazard pointers when you need a hard, provable bound on unreclaimed memory regardless of how badly a reader stalls (real-time systems, adversarial/untrusted reader latency) and can afford a per-pointer store+fence.
Pick EBR when reads vastly outnumber writes, you control the runtime (userspace, not kernel), and you can guarantee readers never block inside a critical section — it is the pragmatic default for concurrent data structures in Rust/C++ (crossbeam-epoch, Folly's Hazptr alternative).
Pick RCU when the reader path must be as close to free as physically possible and you either own kernel-level integration or can use a mature userspace port (liburcu) — accept that updaters pay real latency and that reclamation is not object-granular but interval-granular, so a single slow reader anywhere delays every pending free.

🤖 Don't fully get this? Learn it with Claude

Stuck on Safe Memory Reclamation (Hazard Pointers, Epochs, RCU)? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.

🎨 Explain it visually

Build the mental picture, not memorization.

I just read a lesson on **Safe Memory Reclamation (Hazard Pointers, Epochs, RCU)** (Concurrency) and want to truly understand it. Explain Safe Memory Reclamation (Hazard Pointers, Epochs, RCU) from first principles using ONE vivid real-world analogy and a visual mental model — draw it as ASCII art or a clear step-by-step diagram — with a concrete example using real numbers. Then ask me one question to check I got the mental picture, and wait for my reply. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🤔 Walk me through it (interactive)

Socratic — adapts to where you're stuck.

Teach me **Safe Memory Reclamation (Hazard Pointers, Epochs, RCU)** interactively. Ask me ONE guiding question at a time, wait for my answer, and adapt to my confusion — build the idea with me step by step instead of explaining it all at once. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🧪 Quiz me & fix my gaps

Active recall exposes what you missed.

Quiz me on **Safe Memory Reclamation (Hazard Pointers, Epochs, RCU)** with 5 questions, easy to tricky, ONE at a time. Tell me if each answer is right; at the end, explain clearly what I got wrong and why. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🧠 Make it stick

Intuition + hook + flashcards for long-term memory.

Help me remember **Safe Memory Reclamation (Hazard Pointers, Epochs, RCU)** for the long term: give the one-sentence intuition, a memorable hook/mnemonic, a tiny worked example, and 3 active-recall flashcards (Q -> A). If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

📝 My notes

← Lock-Free Stack & Queue (Treiber,