How Indexes Work — B+tree Internals

Why a lookup needs an index at all

WHERE id = 42 on an unindexed table is a full table scan — read every row, O(n). An index is a separate sorted structure that turns that into O(log n). Almost every relational database uses a B+tree for it — here's why and how.

Why a B+tree, not a binary tree or a hash

Disks (and SSDs) read in pages (~8 KB), and the cost is the number of page reads. A binary tree wastes that — one key per node, ~30 levels for a billion rows = ~30 reads. A B+tree packs hundreds of keys into one page (high fanout), so the tree is only ~3–4 levels deep for a billion rows → ~3–4 page reads per lookup. A hash index does equality in O(1) but can't do ranges or ORDER BY — which is why B+tree is the default.

A B+tree: root routing node points to sorted leaf nodes linked left-to-right; a lookup of 60 descends root to the middle leaf, range scans follow the leaf links

Structure & the two operations

Internal nodes hold only routing keys + child pointers (which subtree to descend).
Leaf nodes hold the actual keys + row pointers, and are chained as a sorted doubly-linked list.

Point lookup (=): start at the root, follow the pointer for your key's range down to the leaf — ~3–4 reads. Range scan / ORDER BY / BETWEEN: descend to the start leaf, then walk the leaf links sideways — this sideways chain is exactly why B+tree beats hash for ranges and sorted output.

Inserts stay balanced via splits

Insert into a leaf; if it overflows, split it in two and push the middle key up to the parent (which may split too, up to the root). This keeps every leaf the same depth — lookups stay O(log n) forever. The cost: writes touch the index too (and may split pages) — write amplification.

Clustered vs secondary (the hop that surprises people)

Clustered index (InnoDB primary key): the table rows are the leaf nodes — data stored in key order. One lookup, you're done.
Secondary index: its leaf stores the PK (or row-id), not the row. So WHERE email=? finds the PK in the email index, then does a second lookup in the clustered index to fetch the row — a "bookmark lookup." A covering index avoids it (next page).

Takeaways

Index = sorted B+tree → O(log n); high fanout means ~3–4 page reads even for billions of rows.
Leaves are a linked list → B+tree (not hash) for ranges/ORDER BY; hash only for equality.
Every index adds write cost (updates + splits) — index deliberately, not reflexively.
Secondary-index lookups may need a second hop to the clustered index.

Re-authored for this guide; B+tree diagram hand-authored as SVG. Follows CMU 15-445 (Andy Pavlo) and DDIA ch. 3. See also: Indexes in Practice, How a Query Executes.

🤖 Don't fully get this? Learn it with Claude

Stuck on How Indexes Work — B+tree Internals? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.

🎨 Explain it visually

Build the mental picture, not memorization.

I just read a lesson on **How Indexes Work — B+tree Internals** (Databases) and want to truly understand it. Explain How Indexes Work — B+tree Internals from first principles using ONE vivid real-world analogy and a visual mental model — draw it as ASCII art or a clear step-by-step diagram — with a concrete example using real numbers. Then ask me one question to check I got the mental picture, and wait for my reply. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🤔 Walk me through it (interactive)

Socratic — adapts to where you're stuck.

Teach me **How Indexes Work — B+tree Internals** interactively. Ask me ONE guiding question at a time, wait for my answer, and adapt to my confusion — build the idea with me step by step instead of explaining it all at once. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🧪 Quiz me & fix my gaps

Active recall exposes what you missed.

Quiz me on **How Indexes Work — B+tree Internals** with 5 questions, easy to tricky, ONE at a time. Tell me if each answer is right; at the end, explain clearly what I got wrong and why. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🧠 Make it stick

Intuition + hook + flashcards for long-term memory.

Help me remember **How Indexes Work — B+tree Internals** for the long term: give the one-sentence intuition, a memorable hook/mnemonic, a tiny worked example, and 3 active-recall flashcards (Q -> A). If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

📝 My notes

Indexes in Practice — When They're →