Knowledge Guide
HomeDSAGraphs

Union-Find (Disjoint Set Union)

Union-Find (Disjoint Set Union)

A Union-Find (DSU) structure maintains a partition of n elements into disjoint sets by storing, for each element, a pointer toward a representative (root) of its set; find(x) walks parent pointers to the root, and union(x,y) links the two roots together — so two elements are in the same set exactly when their roots coincide, and merging sets costs only a pointer update rather than relabeling every member.

Recognize the pattern

Baseline vs. this algorithm

Naive baseline — graph traversal: model connectivity as an adjacency list and run BFS/DFS from a node whenever you need to answer "is x connected to y?" Each query costs O(V+E). If you instead try to maintain an explicit component-id array and merge two components by relabeling every node in one of them, a single merge costs O(size of the component being relabeled), which is O(n) in the worst case — and an adversarial sequence of n merges can cost O(n²) total.

Union-Find: represent each set as a small in-tree (parent pointers, not a full adjacency structure). find(x) costs O(height of x's tree). union(x,y) costs one find on each side plus O(1) to re-point a root. Two optimizations make the trees shallow:

Used together, the amortized cost per operation is O(α(n)), where α is the inverse Ackermann function — for any n that fits in the physical universe, α(n) ≤ 4, so this is "practically constant."

Complexity, derived from first principles

Without either optimization: a chain can form (union always attaches root of x under root of y, y under z, ...), giving tree height O(n), so find is O(n) worst case.

With union by size alone (no compression): every time a node's depth increases by 1, it just moved from a tree of size s into a merged tree of size ≥ 2s (because we always attach the smaller side, so the tree it joined was at least as big as its own). A node can only be on the "smaller side" O(log n) times before the total number of elements n is exhausted, since size at least doubles from that node's perspective each time its depth grows. Hence height ≤ O(log n) and find is O(log n) worst case.

Adding path compression: a single find can still touch O(log n) nodes on a cold path, but every node it touches gets re-attached directly to the root — so the next find through any of them is O(1). This combined bound is Tarjan's classic amortized-analysis result (a potential-function / block argument over ranks): any sequence of m union/find operations on n elements costs O(m·α(n)) total, i.e., O(α(n)) amortized per operation — cited here rather than re-derived, since the full block argument is a page of its own, but the O(log n) case above is fully derived from first principles. Space is O(n): one parent slot and one rank/size slot per element — no adjacency list, no visited array.

Variantfind / union (amortized)Space
Naive relabelingO(n) per unionO(n)
Tree, no optimizationO(n) worst caseO(n)
Union by size/rank onlyO(log n)O(n)
Union by size/rank + path compressionO(α(n)) ≈ O(1)O(n)

Traced worked example

Elements 0..6, all start as singleton sets: parent[i] = i, size[i] = 1. Process unions in this order: union(0,1), union(1,2), union(3,4), union(5,6), union(2,3), find(6) with path compression.

StepOperationRoots beforeAction (smaller size attaches under larger)Parent array after (index 0..6)
1union(0,1)root(0)=0 (size1), root(1)=1 (size1)tie → attach 1 under 0; size[0]=20,0,2,3,4,5,6
2union(1,2)root(1)=0 (size2), root(2)=2 (size1)attach 2 under 0; size[0]=30,0,0,3,4,5,6
3union(3,4)root(3)=3 (size1), root(4)=4 (size1)tie → attach 4 under 3; size[3]=20,0,0,3,3,5,6
4union(5,6)root(5)=5 (size1), root(6)=6 (size1)tie → attach 6 under 5; size[5]=20,0,0,3,3,5,5
5union(2,3)root(2)=0 (size3), root(3)=3 (size2)attach smaller root 3 under 0; size[0]=50,0,0,0,3,5,5
6find(6)6→5→5 (root 5, depth 1)no compression needed (already depth 1); returns 5unchanged

Note element 4 sits at depth 2 (4→3→0). A subsequent find(4) would walk 4→3→0, then path-compression re-points parent[4] = 0 directly, so any later find(4) is O(1).

Java implementation

final class UnionFind {
    private final int[] parent;
    private final int[] size;
    private int components;

    UnionFind(int n) {
        parent = new int[n];
        size = new int[n];
        for (int i = 0; i < n; i++) {
            parent[i] = i;
            size[i] = 1;
        }
        components = n;
    }

    // Path compression: point every visited node straight at the root.
    int find(int x) {
        while (parent[x] != x) {
            parent[x] = parent[parent[x]]; // path halving
            x = parent[x];
        }
        return x;
    }

    // Union by size: attach smaller tree under larger tree's root.
    // Returns false if x and y were already in the same set (cycle detected).
    boolean union(int x, int y) {
        int rx = find(x), ry = find(y);
        if (rx == ry) return false;
        if (size[rx] < size[ry]) { int t = rx; rx = ry; ry = t; }
        parent[ry] = rx;
        size[rx] += size[ry];
        components--;
        return true;
    }

    boolean connected(int x, int y) {
        return find(x) == find(y);
    }

    int componentCount() {
        return components;
    }
}

// Kruskal's MST sketch: sort edges by weight, then union(u, v) each edge
// only if union() returns true (i.e., it does not close a cycle).

Pitfalls

When to use / when NOT — trade-offs vs. DFS-based connectivity

Union-FindDFS/BFS connectivity
Best forIncremental unions interleaved with queries; no need to traverse full structure; cycle detection while adding edges one at a time (Kruskal)Static graph, connectivity known once graph is fully built; need actual paths, distances, or full component contents in traversal order
Query costO(α(n)) amortized per connectivity checkO(V+E) to build components once, then O(1) per query using precomputed labels — but O(V+E) again if edges keep arriving
Handles streaming edges wellYes — designed for itNo — must re-run traversal (or incrementally patch labels, which is what DSU already does better)
Can report the actual path between two nodesNo — only "same set or not"Yes — DFS/BFS naturally reconstructs a path
Can undo a merge / handle edge deletionNo (plain DSU)Yes, by rebuilding or maintaining adjacency and re-traversing
Implementation weightTwo small arrays, ~15 linesAdjacency list + visited array + recursion/stack

Use Union-Find when the workload is "many unions, many connectivity queries, edges only ever added, no path needed" — Kruskal's MST, "redundant connection," incremental network connectivity, offline grid flood-fill counted via merges. Prefer DFS/BFS when you need the shortest/actual path, need to handle edge removal, or only need connectivity computed once on a fixed, complete graph (a single O(V+E) pass is simpler to write and reason about than standing up a DSU for a one-shot query).

Takeaways

Recall question

You are given m union operations to process on n elements, and only union by size is applied (no path compression). What is the tight worst-case bound on the height of any resulting tree, and why does always attaching the smaller-sized tree under the larger one's root guarantee that bound?


Authored for this guide; concepts and amortized-complexity analysis follow the standard treatment in Cormen/Leiserson/Rivest/Stein (CLRS), "Introduction to Algorithms," ch. 21 (Data Structures for Disjoint Sets), and Tarjan's original inverse-Ackermann analysis of union-find.

🤖 Don't fully get this? Learn it with Claude

Stuck on Union-Find (Disjoint Set Union)? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.

🎨 Explain it visually

Build the mental picture, not memorization.

I just read a lesson on **Union-Find (Disjoint Set Union)** (DSA) and want to truly understand it. Explain Union-Find (Disjoint Set Union) from first principles using ONE vivid real-world analogy and a visual mental model — draw it as ASCII art or a clear step-by-step diagram — with a concrete example using real numbers. Then ask me one question to check I got the mental picture, and wait for my reply. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🤔 Walk me through it (interactive)

Socratic — adapts to where you're stuck.

Teach me **Union-Find (Disjoint Set Union)** interactively. Ask me ONE guiding question at a time, wait for my answer, and adapt to my confusion — build the idea with me step by step instead of explaining it all at once. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🧪 Quiz me & fix my gaps

Active recall exposes what you missed.

Quiz me on **Union-Find (Disjoint Set Union)** with 5 questions, easy to tricky, ONE at a time. Tell me if each answer is right; at the end, explain clearly what I got wrong and why. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🧠 Make it stick

Intuition + hook + flashcards for long-term memory.

Help me remember **Union-Find (Disjoint Set Union)** for the long term: give the one-sentence intuition, a memorable hook/mnemonic, a tiny worked example, and 3 active-recall flashcards (Q -> A). If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

📝 My notes