How Indexes Work — B+tree Internals
Why a lookup needs an index at all
WHERE id = 42 on an unindexed table is a full table scan — read every row,
O(n). An index is a separate sorted structure that turns that into O(log n). Almost every relational
database uses a B+tree for it — here's why and how.
Why a B+tree, not a binary tree or a hash
Disks (and SSDs) read in pages (~8 KB), and the cost is the number of page reads. A binary tree wastes that — one key per node, ~30 levels for a billion rows = ~30 reads. A B+tree packs hundreds of keys into one page (high fanout), so the tree is only ~3–4 levels deep for a billion rows → ~3–4 page reads per lookup. A hash index does equality in O(1) but can't do ranges or ORDER BY — which is why B+tree is the default.
Structure & the two operations
- Internal nodes hold only routing keys + child pointers (which subtree to descend).
- Leaf nodes hold the actual keys + row pointers, and are chained as a sorted doubly-linked list.
Point lookup (=): start at the root, follow the pointer for your key's range down to the leaf — ~3–4 reads. Range scan / ORDER BY / BETWEEN: descend to the start leaf, then walk the leaf links sideways — this sideways chain is exactly why B+tree beats hash for ranges and sorted output.
Inserts stay balanced via splits
Insert into a leaf; if it overflows, split it in two and push the middle key up to the parent (which may split too, up to the root). This keeps every leaf the same depth — lookups stay O(log n) forever. The cost: writes touch the index too (and may split pages) — write amplification.
Clustered vs secondary (the hop that surprises people)
- Clustered index (InnoDB primary key): the table rows are the leaf nodes — data stored in key order. One lookup, you're done.
- Secondary index: its leaf stores the PK (or row-id), not the row. So
WHERE email=?finds the PK in the email index, then does a second lookup in the clustered index to fetch the row — a "bookmark lookup." A covering index avoids it (next page).
Takeaways
- Index = sorted B+tree → O(log n); high fanout means ~3–4 page reads even for billions of rows.
- Leaves are a linked list → B+tree (not hash) for ranges/ORDER BY; hash only for equality.
- Every index adds write cost (updates + splits) — index deliberately, not reflexively.
- Secondary-index lookups may need a second hop to the clustered index.
Re-authored for this guide; B+tree diagram hand-authored as SVG. Follows CMU 15-445 (Andy Pavlo) and DDIA ch. 3. See also: Indexes in Practice, How a Query Executes.
🤖 Don't fully get this? Learn it with Claude
Stuck on How Indexes Work — B+tree Internals? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.
Build the mental picture, not memorization.
I just read a lesson on **How Indexes Work — B+tree Internals** (Databases) and want to truly understand it. Explain How Indexes Work — B+tree Internals from first principles using ONE vivid real-world analogy and a visual mental model — draw it as ASCII art or a clear step-by-step diagram — with a concrete example using real numbers. Then ask me one question to check I got the mental picture, and wait for my reply. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
Socratic — adapts to where you're stuck.
Teach me **How Indexes Work — B+tree Internals** interactively. Ask me ONE guiding question at a time, wait for my answer, and adapt to my confusion — build the idea with me step by step instead of explaining it all at once. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
Active recall exposes what you missed.
Quiz me on **How Indexes Work — B+tree Internals** with 5 questions, easy to tricky, ONE at a time. Tell me if each answer is right; at the end, explain clearly what I got wrong and why. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
Intuition + hook + flashcards for long-term memory.
Help me remember **How Indexes Work — B+tree Internals** for the long term: give the one-sentence intuition, a memorable hook/mnemonic, a tiny worked example, and 3 active-recall flashcards (Q -> A). If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.