Home › Concurrency › Lock-Free & Memory Ordering

Compare-And-Swap (CAS) & the ABA Problem

Compare-and-swap is a single CPU instruction that reads a memory word, checks it still equals a value you expected, and writes a new value — all as one indivisible step that no other core can interleave. On x86 this is LOCK CMPXCHG: the LOCK prefix makes the core acquire exclusive ownership of the target cache line (MESI 'M/E' state) for the read-compare-write, so any competing core's write either happened strictly before or strictly after — never in the middle. On ARM/POWER the same effect is built from a load-linked / store-conditional pair (LDXR/STXR): the store succeeds only if nobody touched the line since the load. Either way the hardware gives you an atomic conditional write, and that one primitive is the foundation every lock-free data structure is built on.

Why it matters

A mutex serializes threads: only one runs the critical section, the rest block, and if the holder is preempted (or crashes) everyone waits. CAS lets many threads race optimistically — each computes its update against a snapshot and commits only if nothing changed underneath it. There is no lock to hold, so a thread that stalls mid-operation cannot freeze the others (lock-freedom: system-wide progress is guaranteed). This is how AtomicLong.incrementAndGet, Java's ConcurrentHashMap bucket updates, and Go's runtime scheduler counters avoid a global lock on the hot path. The cost: you must be able to express the update as 'read → compute → conditionally commit → retry on failure,' and you inherit a subtle correctness trap — the ABA problem.

The optimistic retry loop

Because CAS can fail (someone else committed first), you wrap it in a loop: snapshot the current value, compute the new value from that snapshot, attempt to commit, and retry from a fresh snapshot on failure. A failed CAS is not an error — it is the signal 'the world moved; recompute.'

Trace a concurrent counter at start value 41 with threads T1 and T2 both incrementing:

step	T1	T2	counter
1	read cur=41, next=42	read cur=41, next=42	41
2	CAS(41→42) ✓ true	—	42
3	—	CAS(41→42) ✗ false (M is 42)	42
4	—	retry: read cur=42, next=43	42
5	—	CAS(42→43) ✓ true	43

No update was lost — T2's stale CAS failed and it recomputed against the new value. Java and Go express the identical loop:

// Java — java.util.concurrent.atomic
final AtomicLong counter = new AtomicLong(41);

long incrementAndGet() {
    long cur, next;
    do {
        cur  = counter.get();      // snapshot
        next = cur + 1;            // compute off the snapshot
    } while (!counter.compareAndSet(cur, next)); // commit or retry
    return next;
}

// Go — sync/atomic
var counter int64 = 41

func incrementAndGet() int64 {
    for {
        cur  := atomic.LoadInt64(&counter)
        next := cur + 1
        if atomic.CompareAndSwapInt64(&counter, cur, next) {
            return next
        }
    }
}

Runtime note: the JVM lowers compareAndSet to a hardware CAS intrinsic (via Unsafe/VarHandle) — no JNI, no lock. Go's atomic.CompareAndSwapInt64 is an assembly intrinsic likewise. Both give sequentially-consistent ordering for the atomic itself, so the loop is also a memory fence.

The ABA problem

CAS compares values (or pointer bits), not history. It cannot tell 'unchanged the whole time' apart from 'changed to B and back to A.' If the word held A when you read it and holds A again when you CAS, the swap succeeds — even though the world you assumed is gone. For a plain counter this is harmless (42 is 42). For a pointer it is a real bug, because the address may have been freed and recycled to a different logical node.

Trace a lock-free Treiber stack holding A → B → C (top = A). T1 starts a pop:

T1 reads cur = A, computes next = A.next = B. It is about to CAS(top, A, B) — then the scheduler preempts it.
T2 runs freely: pop() returns A (top → B), pop() returns B (top → C).
T2 reuses node A (from a free list / object pool) and pushes it: top → A, and A.next = C. Stack is now A → C.
T1 resumes and executes CAS(top, A, B). top is A (same address / same reference), so CAS succeeds and sets top = B.

Disaster: B was already popped. top now points at a detached, possibly-freed node; C is leaked out of the stack. The structure is silently corrupted by a CAS that 'succeeded.' The read and the CAS both saw A — ABA slipped between them.

The fixes: version the word, not just the value

All fixes make the word carry identity over time so a round-trip A→B→A is detectable:

Tagged / versioned pointer (stamp). Pair the pointer with a monotonically increasing counter and CAS the pair. Even if the pointer returns to A, the stamp advanced, so the CAS on (A, stamp) fails. This is the standard fix.
Double-width CAS (DWCAS). Steal a hardware instruction that swaps two adjacent words atomically — x86-64 LOCK CMPXCHG16B (128-bit: 64-bit pointer + 64-bit tag). C++ exposes it via a named struct: struct TaggedPtr { Node* ptr; uint64_t tag; }; std::atomic<TaggedPtr> top; top.compare_exchange_strong(expected, desired); — the type has to be named because you cannot pass an anonymous struct as a template argument. A 64-bit tag effectively never wraps, so ABA is eliminated in practice.
Pointer tag bits. When alignment leaves low/high pointer bits free, pack a small counter into them and use ordinary single-word CAS. Cheap but the tag is tiny, so it can wrap — a risk on very hot structures.
Safe reclamation (hazard pointers, epoch-based reclamation, RCU): don't reuse a node's memory while any thread might still hold it, which removes the recycling that makes ABA harmful. This is what production allocators/GCs lean on.

Java packages the versioned-pointer fix as AtomicStampedReference<V> — a reference plus an int stamp, CAS'd together:

// Java — ABA-safe Treiber pop using AtomicStampedReference
static final class Node<E> { final E item; Node<E> next; Node(E i){item=i;} }
final AtomicStampedReference<Node<E>> top =
        new AtomicStampedReference<>(null, 0);

E pop() {
    int[] stampHolder = new int[1];
    Node<E> cur, next;
    do {
        cur = top.get(stampHolder);        // read ref + its stamp
        if (cur == null) return null;
        next = cur.next;
    } while (!top.compareAndSet(
                cur, next,                 // expected ref, new ref
                stampHolder[0], stampHolder[0] + 1)); // expected stamp, new stamp
    return cur.item;
}

Now the T1/T2 trace fails safely: after T2's pop/pop/push the stamp is 3, but T1 still expects stamp 0, so compareAndSet(A, B, 0, 1) returns false and T1 harmlessly retries. Go has no built-in stamped reference; you build the same thing with atomic.Uint64 packing (pointer index + version) or atomic.Pointer[T] plus epoch-based reclamation.

Pitfalls a working engineer hits

'GC means no ABA' is false. A tracing GC stops a node from being freed while reachable, which prevents the classic use-after-free ABA. But the moment you recycle nodes yourself — object pools, free lists, reused array slots — the same reference reappears and ABA returns. Node pooling is common precisely on the hot lock-free paths that most need CAS.
ABA only bites pointers/handles, not scalars. Don't reach for a stamped reference on a plain counter; the added CAS width and allocation are pure overhead there.
AtomicStampedReference boxes. Each instance wraps ref+stamp in an internal Pair object allocated on every successful update — real allocation churn under contention. If you can spare pointer bits, tag-in-pointer avoids the box.
Livelock, not deadlock. Lock-free CAS loops never deadlock, but under heavy contention threads can spin retrying repeatedly (progress is guaranteed for some thread, not fairness for a given one). Watch CPU burn on the retry loop; consider backoff.
Spurious LL/SC failures. On ARM/POWER, STXR can fail even with no real conflict (context switch, cache eviction). Correct code already retries, but a tight loop with a function call between LL and SC may never make progress — keep the CAS window minimal.
ABA can hide behind 'it worked in testing.' The window is a preemption between your read and CAS; it surfaces only under real contention and recycling, often first in production. Reason about it, don't test for it.

Trade-offs & when to use CAS vs a mutex

The named alternative is a blocking lock (ReentrantLock / synchronized in Java, sync.Mutex in Go).

	CAS / lock-free	Mutex (blocking lock)
Progress	Lock-free: a stalled thread never blocks others	Holder stall/crash blocks everyone
Contention behavior	Optimistic; wasted work + retries as contention rises	Pessimistic; parks waiters (no wasted CPU spinning)
Critical section	Must fit one word (or a small tuple via DWCAS)	Arbitrary multi-word invariants, freely
Correctness cost	ABA, memory ordering, reclamation — subtle	Simple to reason about

Use CAS when the update is a single word or pointer, contention is low-to-moderate, and you cannot tolerate one thread blocking the rest (hot counters, single-linked lock-free stacks/queues, flags, futures resolving once). Use a mutex when the invariant spans multiple fields (transfer between two accounts, resize-then-rehash), when contention is high enough that spinning wastes more than parking, or simply when clarity matters more than the last ounce of throughput. A senior default: reach for Atomic* for a single value; reach for a lock the moment two fields must change together — don't contort several CAS'd words to fake a multi-word transaction.

Takeaways

CAS is one atomic read-compare-write (LOCK CMPXCHG / LL-SC); it is the single primitive under all lock-free code, always used inside a snapshot-compute-commit-retry loop where a failed CAS just means 'retry.'
CAS compares values, not history — so A→B→A looks unchanged. That's the ABA problem, and it corrupts pointer-based structures (freed/recycled nodes), not scalar counters.
Fix ABA by versioning the word: a stamp (Java AtomicStampedReference), double-width CAS (CMPXCHG16B), pointer tag bits, or safe reclamation (hazard pointers / epochs / RCU).
Choose CAS for single-word, progress-critical hot paths; choose a mutex the moment an invariant spans multiple fields.

Recall

A lock-free stack uses a plain AtomicReference<Node> and a per-thread free list that recycles popped nodes. Under a GC that never frees a reachable node, can ABA still corrupt the stack — and if so, exactly when? (Answer: yes. GC prevents use-after-free, but the free list re-pushes the same node object, so its reference reappears at top; a thread that read the old top and stalled will have its CAS(top, A, oldNext) succeed against a node whose next now points elsewhere — detaching live nodes. The stamp/version is what closes the window.)

Sources: Herlihy & Shavit, The Art of Multiprocessor Programming (CAS, Treiber stack, ABA, LL/SC); Goetz et al., Java Concurrency in Practice (atomics, nonblocking algorithms); Michael & Scott, Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms (PODC 1996); Maged Michael, Hazard Pointers (IEEE TPDS 2004); Intel 64 & IA-32 SDM Vol.2 (CMPXCHG/CMPXCHG16B, LOCK); JDK javadocs for AtomicReference / AtomicStampedReference; Go sync/atomic docs. Re-authored/Deepened for this guide.

🤖 Don't fully get this? Learn it with Claude

Stuck on Compare-And-Swap (CAS) & the ABA Problem? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.

🎨 Explain it visually

Build the mental picture, not memorization.

I just read a lesson on **Compare-And-Swap (CAS) & the ABA Problem** (Concurrency) and want to truly understand it. Explain Compare-And-Swap (CAS) & the ABA Problem from first principles using ONE vivid real-world analogy and a visual mental model — draw it as ASCII art or a clear step-by-step diagram — with a concrete example using real numbers. Then ask me one question to check I got the mental picture, and wait for my reply. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🤔 Walk me through it (interactive)

Socratic — adapts to where you're stuck.

Teach me **Compare-And-Swap (CAS) & the ABA Problem** interactively. Ask me ONE guiding question at a time, wait for my answer, and adapt to my confusion — build the idea with me step by step instead of explaining it all at once. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🧪 Quiz me & fix my gaps

Active recall exposes what you missed.

Quiz me on **Compare-And-Swap (CAS) & the ABA Problem** with 5 questions, easy to tricky, ONE at a time. Tell me if each answer is right; at the end, explain clearly what I got wrong and why. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🧠 Make it stick

Intuition + hook + flashcards for long-term memory.

Help me remember **Compare-And-Swap (CAS) & the ABA Problem** for the long term: give the one-sentence intuition, a memorable hook/mnemonic, a tiny worked example, and 3 active-recall flashcards (Q -> A). If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

📝 My notes

← Memory Ordering, Barriers & Happen Lock-Free Stack & Queue (Treiber, →