Topological Sort
Topological sort works because a directed acyclic graph (DAG) has at least one node with in-degree zero at every stage of peeling — repeatedly stripping off a source (a node with no unprocessed prerequisites) and recording it produces a linear order where every edge u -> v has u appearing before v; this is only possible when the graph has no cycles, since a cycle would leave every node in it permanently waiting on another.
Recognize the pattern
- The problem talks about dependencies, prerequisites, or ordering: "build A before B", "course X requires course Y", "task must run after task Z".
- Relationships form a directed graph, and cycles would make the task impossible — e.g. "course schedule", "build system", "package manager dependency resolution", "compilation order".
- You need any one valid linear order, not shortest path or minimum cost — the question is feasibility and sequencing, not weight.
- Phrases like "can all tasks be completed", "is there a valid order", "detect a deadlock/cycle" are strong tells.
Brute force vs. the real algorithm
Naive baseline: repeatedly scan all V nodes, find any node whose prerequisites are already placed, place it, remove it, and restart the scan from the top. Each full scan costs O(V + E) (check every node's incoming edges), and you repeat the scan V times → O(V · (V + E)) total. It works but rediscovers the same in-degree information from scratch every round.
Kahn's algorithm (BFS-flavored, in-degree counting) and the DFS finish-time algorithm both keep running state so every node and edge is examined a constant number of times overall, dropping the cost to linear.
Kahn's algorithm (BFS on in-degrees)
- Compute in-degree[v] for every node (count of incoming edges).
- Push all nodes with in-degree 0 into a queue — these have no unmet prerequisite.
- Pop a node u, append it to the result order.
- For each edge u→v, decrement in-degree[v]. If it hits 0, push v.
- Repeat until the queue is empty. If the result has fewer than V nodes, a cycle exists among the leftover nodes — no valid order.
import java.util.*;
public class TopoKahn {
// returns topological order, or empty list if a cycle exists
public static List<Integer> topoSort(int n, List<List<Integer>> adj) {
int[] indeg = new int[n];
for (int u = 0; u < n; u++)
for (int v : adj.get(u)) indeg[v]++;
Deque<Integer> queue = new ArrayDeque<>();
for (int v = 0; v < n; v++)
if (indeg[v] == 0) queue.add(v);
List<Integer> order = new ArrayList<>();
while (!queue.isEmpty()) {
int u = queue.poll();
order.add(u);
for (int v : adj.get(u)) {
if (--indeg[v] == 0) queue.add(v);
}
}
if (order.size() != n) return new ArrayList<>(); // cycle detected
return order;
}
}DFS finish-time algorithm
Run DFS from every unvisited node. When a node has no more unexplored outgoing edges (it "finishes"), push it onto a stack. Because a node only finishes after all nodes reachable from it have already finished, reversing the finish order (or popping the stack) yields a valid topological order. Cycle detection needs a third color: a node currently on the recursion stack (gray) reached again means a back-edge → cycle.
import java.util.*;
public class TopoDFS {
private static final int WHITE = 0, GRAY = 1, BLACK = 2;
private static boolean hasCycle;
public static List<Integer> topoSort(int n, List<List<Integer>> adj) {
hasCycle = false; // reset shared state: this method can be called more than once
int[] color = new int[n];
Deque<Integer> finishStack = new ArrayDeque<>();
for (int v = 0; v < n; v++) {
if (color[v] == WHITE) dfs(v, adj, color, finishStack);
}
if (hasCycle) return new ArrayList<>();
List<Integer> order = new ArrayList<>();
while (!finishStack.isEmpty()) order.add(finishStack.pop());
return order;
}
private static void dfs(int u, List<List<Integer>> adj, int[] color, Deque<Integer> finishStack) {
color[u] = GRAY;
for (int v : adj.get(u)) {
if (color[v] == GRAY) { hasCycle = true; return; } // back-edge -> cycle
if (color[v] == WHITE) dfs(v, adj, color, finishStack);
}
color[u] = BLACK;
finishStack.push(u); // record finish time
}
}Why the reset matters: hasCycle is state shared across calls (a static field). Without resetting it to false at the top of topoSort, one earlier call on a cyclic graph permanently poisons every later call — even a perfectly valid DAG passed in afterward would incorrectly report a cycle forever, because the stale true from the previous run is never cleared. In real code prefer avoiding static/shared mutable state entirely (e.g. return a small result record or thread it through as a local boolean[1]) so the bug class can't recur when someone adds concurrency later.
Complexity, derived from first principles
Kahn's algorithm: building in-degrees scans every edge once → O(E). Each node is pushed to the queue exactly once (when its in-degree first hits 0) and popped exactly once → O(V) queue operations. Each edge is examined exactly once when its source is popped (to decrement the target's in-degree) → O(E). Total time O(V + E). Space: in-degree array O(V), queue holds at most V nodes O(V), result list O(V) → O(V) auxiliary space (plus the O(V+E) adjacency list itself).
DFS algorithm: each node is visited (colored gray then black) exactly once, and the inner loop over each node's outgoing edges runs once per node → total edge examinations across all calls = O(E), total node visits = O(V). Total time O(V + E). Space: color array O(V), recursion stack up to O(V) in the worst case (a long chain), finish stack O(V) → O(V) auxiliary space.
Both are asymptotically identical; the constant-factor difference is recursion overhead (DFS) vs. explicit queue management (Kahn's).
Traced worked example
Graph (6 build targets 0–5) with edges: 5→2, 5→0, 4→0, 4→1, 2→3, 3→1. Read "5→2" as "5 must build before 2".
Kahn's trace
Initial in-degrees: 0:2, 1:2, 2:1, 3:1, 4:0, 5:0.
| Step | Pop | Order so far | Updates | Queue after |
|---|---|---|---|---|
| 0 | — | [] | seed queue with in-degree-0 nodes | [4, 5] |
| 1 | 4 | [4] | indeg[0]: 2→1, indeg[1]: 2→1 | [5] |
| 2 | 5 | [4,5] | indeg[2]: 1→0 (push 2), indeg[0]: 1→0 (push 0) | [2,0] |
| 3 | 2 | [4,5,2] | indeg[3]: 1→0 (push 3) | [0,3] |
| 4 | 0 | [4,5,2,0] | no outgoing edges | [3] |
| 5 | 3 | [4,5,2,0,3] | indeg[1]: 1→0 (push 1) | [1] |
| 6 | 1 | [4,5,2,0,3,1] | no outgoing edges | [] |
Result has all 6 nodes → valid order 4, 5, 2, 0, 3, 1.
DFS trace
Visit nodes in id order 0..5, pushing to a finish-stack on completion.
- DFS(0): no out-edges → finish 0. Stack: [0]
- DFS(1): no out-edges → finish 1. Stack: [0,1]
- DFS(2): visits 3 first (3→1 already black, skip) → finish 3, then finish 2. Stack: [0,1,3,2]
- DFS(3): already black, skip.
- DFS(4): edges to 0 (black) and 1 (black), nothing new → finish 4. Stack: [0,1,3,2,4]
- DFS(5): edges to 2 (black) and 0 (black) → finish 5. Stack: [0,1,3,2,4,5]
Popping the stack gives order 5, 4, 2, 3, 1, 0 — a different but equally valid topological order (both satisfy every edge constraint; topological order is not unique).
Pitfalls
- Cycles silently ignored: if you forget to check
order.size() == n(Kahn's) or track the gray/recursion-stack state (DFS), a cyclic graph produces a partial, wrong order instead of an error. - Stale shared state across calls: a flag like
hasCyclethat lives outside the function (a static/instance field) must be reset at the start of every call; otherwise one cycle-containing run poisons every subsequent call, even on a valid DAG. Prefer returning the flag as part of the result instead of mutating shared state. - White/black-only DFS coloring: using just visited/unvisited (2 colors) instead of white/gray/black cannot distinguish "currently being explored above me on the stack" (a real back-edge/cycle) from "already fully explored elsewhere" (a harmless cross-edge in a DAG) — this is the single most common correctness bug in DFS-based topo sort.
- Off-by-one on push timing: in the DFS version, pushing a node to the finish-stack before recursing into its neighbors (instead of after) produces a completely invalid order — the push must happen on finish, not on discovery.
- Disconnected components: a single DFS call or a single BFS seed will miss nodes in other components; you must loop over all nodes and start DFS/seed the queue from every unvisited/in-degree-0 node, not just node 0.
- Non-uniqueness surprises: multiple valid orders usually exist (see the two different results above); don't assume there is one "correct" answer to check against — verify by checking every edge points forward in your output order instead.
- Negative or weighted edges are irrelevant here — topo sort ignores weights entirely; conflating it with shortest-path algorithms (which do care about edge weights/signs) is a category error.
When to use / when not, and trade-offs
Use topological sort when: you have a DAG of dependencies and need any valid linear execution order, or you need to detect whether a cyclic dependency makes the task impossible (circular build dependency, deadlocked course prerequisites, a cyclic import graph).
Do not use it when: edges have weights and you need the cheapest/longest path through the DAG — topo order is a building block for DAG shortest/longest path (process nodes in topo order, relax edges once each), not a replacement for it. Also skip it if the graph is undirected, or if you need all valid orderings (that requires backtracking, not a single linear pass).
| Approach | Time | Space | Cycle detection | Style |
|---|---|---|---|---|
| Kahn's (in-degree + queue) | O(V+E) | O(V) | leftover nodes after queue empties | iterative, easy to explain, gives BFS-like "levels" for free |
| DFS finish-time | O(V+E) | O(V) | gray node revisited (back-edge) | recursive, natural if you already have DFS infrastructure; risk of stack overflow on very deep/skewed graphs |
| Naive repeated scan | O(V·(V+E)) | O(V) | no progress in a full pass | simplest to write, too slow beyond small inputs |
Prefer Kahn's when you also want to process nodes level-by-level (e.g. parallel task scheduling: everything in one queue-pop "wave" can run concurrently) or when recursion depth is a concern. Prefer DFS-based when you're already running DFS for other reasons (e.g. computing SCCs via Tarjan/Kosaraju, which uses the same finish-time idea).
Takeaways
- Topological sort only exists for DAGs; its core job is often really cycle detection disguised as ordering.
- Kahn's algorithm peels off in-degree-0 sources with a queue; DFS achieves the same result by reversing post-order finish times — both run in O(V+E) time and O(V) extra space.
- The order is generally not unique — validate any candidate answer by checking that every edge points forward, not by string-matching a "canonical" order.
- It's a building block: DAG shortest/longest path algorithms process nodes in topological order and relax each edge exactly once.
Recall question
You run Kahn's algorithm on a graph with 10 nodes and the final output list has only 7 nodes. What do you conclude, and what do the missing 3 nodes have in common?
Authored for this guide; concepts and complexity analysis follow the standard treatment in CLRS (Cormen, Leiserson, Rivest, Stein), "Introduction to Algorithms", chapter on Elementary Graph Algorithms, and Kahn (1962), "Topological sorting of large networks".
🤖 Don't fully get this? Learn it with Claude
Stuck on Topological Sort? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.
Build the mental picture, not memorization.
I just read a lesson on **Topological Sort** (DSA) and want to truly understand it. Explain Topological Sort from first principles using ONE vivid real-world analogy and a visual mental model — draw it as ASCII art or a clear step-by-step diagram — with a concrete example using real numbers. Then ask me one question to check I got the mental picture, and wait for my reply. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
Socratic — adapts to where you're stuck.
Teach me **Topological Sort** interactively. Ask me ONE guiding question at a time, wait for my answer, and adapt to my confusion — build the idea with me step by step instead of explaining it all at once. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
Active recall exposes what you missed.
Quiz me on **Topological Sort** with 5 questions, easy to tricky, ONE at a time. Tell me if each answer is right; at the end, explain clearly what I got wrong and why. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
Intuition + hook + flashcards for long-term memory.
Help me remember **Topological Sort** for the long term: give the one-sentence intuition, a memorable hook/mnemonic, a tiny worked example, and 3 active-recall flashcards (Q -> A). If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.