Home › DSA › Graphs

Topological Sort

Topological sort works because a directed acyclic graph (DAG) has at least one node with in-degree zero at every stage of peeling — repeatedly stripping off a source (a node with no unprocessed prerequisites) and recording it produces a linear order where every edge u -> v has u appearing before v; this is only possible when the graph has no cycles, since a cycle would leave every node in it permanently waiting on another.

Recognize the pattern

The problem talks about dependencies, prerequisites, or ordering: "build A before B", "course X requires course Y", "task must run after task Z".
Relationships form a directed graph, and cycles would make the task impossible — e.g. "course schedule", "build system", "package manager dependency resolution", "compilation order".
You need any one valid linear order, not shortest path or minimum cost — the question is feasibility and sequencing, not weight.
Phrases like "can all tasks be completed", "is there a valid order", "detect a deadlock/cycle" are strong tells.

Brute force vs. the real algorithm

Naive baseline: repeatedly scan all V nodes, find any node whose prerequisites are already placed, place it, remove it, and restart the scan from the top. Each full scan costs O(V + E) (check every node's incoming edges), and you repeat the scan V times → O(V · (V + E)) total. It works but rediscovers the same in-degree information from scratch every round.

Kahn's algorithm (BFS-flavored, in-degree counting) and the DFS finish-time algorithm both keep running state so every node and edge is examined a constant number of times overall, dropping the cost to linear.

Kahn's algorithm (BFS on in-degrees)

Compute in-degree[v] for every node (count of incoming edges).
Push all nodes with in-degree 0 into a queue — these have no unmet prerequisite.
Pop a node u, append it to the result order.
For each edge u→v, decrement in-degree[v]. If it hits 0, push v.
Repeat until the queue is empty. If the result has fewer than V nodes, a cycle exists among the leftover nodes — no valid order.

import java.util.*;

public class TopoKahn {
    // returns topological order, or empty list if a cycle exists
    public static List<Integer> topoSort(int n, List<List<Integer>> adj) {
        int[] indeg = new int[n];
        for (int u = 0; u < n; u++)
            for (int v : adj.get(u)) indeg[v]++;

        Deque<Integer> queue = new ArrayDeque<>();
        for (int v = 0; v < n; v++)
            if (indeg[v] == 0) queue.add(v);

        List<Integer> order = new ArrayList<>();
        while (!queue.isEmpty()) {
            int u = queue.poll();
            order.add(u);
            for (int v : adj.get(u)) {
                if (--indeg[v] == 0) queue.add(v);
            }
        }
        if (order.size() != n) return new ArrayList<>(); // cycle detected
        return order;
    }
}

DFS finish-time algorithm

Run DFS from every unvisited node. When a node has no more unexplored outgoing edges (it "finishes"), push it onto a stack. Because a node only finishes after all nodes reachable from it have already finished, reversing the finish order (or popping the stack) yields a valid topological order. Cycle detection needs a third color: a node currently on the recursion stack (gray) reached again means a back-edge → cycle.

import java.util.*;

public class TopoDFS {
    private static final int WHITE = 0, GRAY = 1, BLACK = 2;
    private static boolean hasCycle;

    public static List<Integer> topoSort(int n, List<List<Integer>> adj) {
        hasCycle = false; // reset shared state: this method can be called more than once
        int[] color = new int[n];
        Deque<Integer> finishStack = new ArrayDeque<>();
        for (int v = 0; v < n; v++) {
            if (color[v] == WHITE) dfs(v, adj, color, finishStack);
        }
        if (hasCycle) return new ArrayList<>();
        List<Integer> order = new ArrayList<>();
        while (!finishStack.isEmpty()) order.add(finishStack.pop());
        return order;
    }

    private static void dfs(int u, List<List<Integer>> adj, int[] color, Deque<Integer> finishStack) {
        color[u] = GRAY;
        for (int v : adj.get(u)) {
            if (color[v] == GRAY) { hasCycle = true; return; }      // back-edge -> cycle
            if (color[v] == WHITE) dfs(v, adj, color, finishStack);
        }
        color[u] = BLACK;
        finishStack.push(u); // record finish time
    }
}

Why the reset matters: hasCycle is state shared across calls (a static field). Without resetting it to false at the top of topoSort, one earlier call on a cyclic graph permanently poisons every later call — even a perfectly valid DAG passed in afterward would incorrectly report a cycle forever, because the stale true from the previous run is never cleared. In real code prefer avoiding static/shared mutable state entirely (e.g. return a small result record or thread it through as a local boolean[1]) so the bug class can't recur when someone adds concurrency later.

Complexity, derived from first principles

Kahn's algorithm: building in-degrees scans every edge once → O(E). Each node is pushed to the queue exactly once (when its in-degree first hits 0) and popped exactly once → O(V) queue operations. Each edge is examined exactly once when its source is popped (to decrement the target's in-degree) → O(E). Total time O(V + E). Space: in-degree array O(V), queue holds at most V nodes O(V), result list O(V) → O(V) auxiliary space (plus the O(V+E) adjacency list itself).

DFS algorithm: each node is visited (colored gray then black) exactly once, and the inner loop over each node's outgoing edges runs once per node → total edge examinations across all calls = O(E), total node visits = O(V). Total time O(V + E). Space: color array O(V), recursion stack up to O(V) in the worst case (a long chain), finish stack O(V) → O(V) auxiliary space.

Both are asymptotically identical; the constant-factor difference is recursion overhead (DFS) vs. explicit queue management (Kahn's).

Traced worked example

Graph (6 build targets 0–5) with edges: 5→2, 5→0, 4→0, 4→1, 2→3, 3→1. Read "5→2" as "5 must build before 2".

Kahn's trace

Initial in-degrees: 0:2, 1:2, 2:1, 3:1, 4:0, 5:0.

Step	Pop	Order so far	Updates	Queue after
0	—	[]	seed queue with in-degree-0 nodes	[4, 5]
1	4	[4]	indeg[0]: 2→1, indeg[1]: 2→1	[5]
2	5	[4,5]	indeg[2]: 1→0 (push 2), indeg[0]: 1→0 (push 0)	[2,0]
3	2	[4,5,2]	indeg[3]: 1→0 (push 3)	[0,3]
4	0	[4,5,2,0]	no outgoing edges	[3]
5	3	[4,5,2,0,3]	indeg[1]: 1→0 (push 1)	[1]
6	1	[4,5,2,0,3,1]	no outgoing edges	[]

Result has all 6 nodes → valid order 4, 5, 2, 0, 3, 1.

DFS trace

Visit nodes in id order 0..5, pushing to a finish-stack on completion.

DFS(0): no out-edges → finish 0. Stack: [0]
DFS(1): no out-edges → finish 1. Stack: [0,1]
DFS(2): visits 3 first (3→1 already black, skip) → finish 3, then finish 2. Stack: [0,1,3,2]
DFS(3): already black, skip.
DFS(4): edges to 0 (black) and 1 (black), nothing new → finish 4. Stack: [0,1,3,2,4]
DFS(5): edges to 2 (black) and 0 (black) → finish 5. Stack: [0,1,3,2,4,5]

Popping the stack gives order 5, 4, 2, 3, 1, 0 — a different but equally valid topological order (both satisfy every edge constraint; topological order is not unique).

Pitfalls

Cycles silently ignored: if you forget to check order.size() == n (Kahn's) or track the gray/recursion-stack state (DFS), a cyclic graph produces a partial, wrong order instead of an error.
Stale shared state across calls: a flag like hasCycle that lives outside the function (a static/instance field) must be reset at the start of every call; otherwise one cycle-containing run poisons every subsequent call, even on a valid DAG. Prefer returning the flag as part of the result instead of mutating shared state.
White/black-only DFS coloring: using just visited/unvisited (2 colors) instead of white/gray/black cannot distinguish "currently being explored above me on the stack" (a real back-edge/cycle) from "already fully explored elsewhere" (a harmless cross-edge in a DAG) — this is the single most common correctness bug in DFS-based topo sort.
Off-by-one on push timing: in the DFS version, pushing a node to the finish-stack before recursing into its neighbors (instead of after) produces a completely invalid order — the push must happen on finish, not on discovery.
Disconnected components: a single DFS call or a single BFS seed will miss nodes in other components; you must loop over all nodes and start DFS/seed the queue from every unvisited/in-degree-0 node, not just node 0.
Non-uniqueness surprises: multiple valid orders usually exist (see the two different results above); don't assume there is one "correct" answer to check against — verify by checking every edge points forward in your output order instead.
Negative or weighted edges are irrelevant here — topo sort ignores weights entirely; conflating it with shortest-path algorithms (which do care about edge weights/signs) is a category error.

When to use / when not, and trade-offs

Use topological sort when: you have a DAG of dependencies and need any valid linear execution order, or you need to detect whether a cyclic dependency makes the task impossible (circular build dependency, deadlocked course prerequisites, a cyclic import graph).

Do not use it when: edges have weights and you need the cheapest/longest path through the DAG — topo order is a building block for DAG shortest/longest path (process nodes in topo order, relax edges once each), not a replacement for it. Also skip it if the graph is undirected, or if you need all valid orderings (that requires backtracking, not a single linear pass).

Approach	Time	Space	Cycle detection	Style
Kahn's (in-degree + queue)	O(V+E)	O(V)	leftover nodes after queue empties	iterative, easy to explain, gives BFS-like "levels" for free
DFS finish-time	O(V+E)	O(V)	gray node revisited (back-edge)	recursive, natural if you already have DFS infrastructure; risk of stack overflow on very deep/skewed graphs
Naive repeated scan	O(V·(V+E))	O(V)	no progress in a full pass	simplest to write, too slow beyond small inputs

Prefer Kahn's when you also want to process nodes level-by-level (e.g. parallel task scheduling: everything in one queue-pop "wave" can run concurrently) or when recursion depth is a concern. Prefer DFS-based when you're already running DFS for other reasons (e.g. computing SCCs via Tarjan/Kosaraju, which uses the same finish-time idea).

Takeaways

Topological sort only exists for DAGs; its core job is often really cycle detection disguised as ordering.
Kahn's algorithm peels off in-degree-0 sources with a queue; DFS achieves the same result by reversing post-order finish times — both run in O(V+E) time and O(V) extra space.
The order is generally not unique — validate any candidate answer by checking that every edge points forward, not by string-matching a "canonical" order.
It's a building block: DAG shortest/longest path algorithms process nodes in topological order and relax each edge exactly once.

Recall question

You run Kahn's algorithm on a graph with 10 nodes and the final output list has only 7 nodes. What do you conclude, and what do the missing 3 nodes have in common?

Authored for this guide; concepts and complexity analysis follow the standard treatment in CLRS (Cormen, Leiserson, Rivest, Stein), "Introduction to Algorithms", chapter on Elementary Graph Algorithms, and Kahn (1962), "Topological sorting of large networks".

🤖 Don't fully get this? Learn it with Claude

Stuck on Topological Sort? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.

🎨 Explain it visually

Build the mental picture, not memorization.

I just read a lesson on **Topological Sort** (DSA) and want to truly understand it. Explain Topological Sort from first principles using ONE vivid real-world analogy and a visual mental model — draw it as ASCII art or a clear step-by-step diagram — with a concrete example using real numbers. Then ask me one question to check I got the mental picture, and wait for my reply. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🤔 Walk me through it (interactive)

Socratic — adapts to where you're stuck.

Teach me **Topological Sort** interactively. Ask me ONE guiding question at a time, wait for my answer, and adapt to my confusion — build the idea with me step by step instead of explaining it all at once. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🧪 Quiz me & fix my gaps

Active recall exposes what you missed.

Quiz me on **Topological Sort** with 5 questions, easy to tricky, ONE at a time. Tell me if each answer is right; at the end, explain clearly what I got wrong and why. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🧠 Make it stick

Intuition + hook + flashcards for long-term memory.

Help me remember **Topological Sort** for the long term: give the one-sentence intuition, a memorable hook/mnemonic, a tiny worked example, and 3 active-recall flashcards (Q -> A). If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

📝 My notes

← Union-Find (Disjoint Set Union)Minimum Spanning Tree — Kruskal & →