Storage Engines — B-tree vs LSM-tree
Two ways to lay bytes on disk
Under every index and table is a storage engine, and almost all of them are one of two families with opposite trade-offs: B-tree (update in place, read-optimized) and LSM-tree (append & compact, write-optimized). The choice decides whether a database is great at reads or at heavy writes.
B-tree (Postgres, InnoDB, most RDBMS)
The B+tree from the indexing lesson, persisted: a write finds the page and updates it in place (plus a WAL entry for durability). Reads are a few seeks (~O(log n)). Great for reads and transactional workloads; writes do random I/O and may split pages (write amplification).
LSM-tree (Cassandra, RocksDB, LevelDB, Bigtable)
Optimized for write throughput:
- A write appends to an in-memory memtable (sorted) + a WAL — sequential, very fast.
- When the memtable fills, it's flushed to an immutable sorted file on disk (SSTable).
- Background compaction merges SSTables, discarding overwritten/deleted keys.
Reads are the cost: a key may be in the memtable or any SSTable, so a read may check several — mitigated by bloom filters (skip SSTables that definitely don't hold the key) and the sorted layout.
B-tree vs LSM — the trade
| B-tree | LSM-tree | |
|---|---|---|
| Writes | update-in-place, random I/O | append-only, sequential — faster, higher throughput |
| Reads | few seeks, predictable | may merge several SSTables (bloom filters help) |
| Write amplification | page splits + WAL | compaction rewrites data repeatedly |
| Best for | read-heavy, transactional (OLTP) | write-heavy ingest, time-series, logs |
Takeaways
- B-tree: update-in-place, read-optimized — the default for relational/OLTP databases.
- LSM: append + compact, write-optimized — reads cost more (bloom filters mitigate); great for heavy ingest.
- It's a read-vs-write amplification trade — pick the engine by your workload's read/write ratio.
Re-authored for this guide; LSM-tree diagram hand-authored as SVG. Follows DDIA ch. 3 and the RocksDB/ LevelDB design docs. See also: How Indexes Work (B+tree), Bloom Filters (System Design).
🤖 Don't fully get this? Learn it with Claude
Stuck on Storage Engines — B-tree vs LSM-tree? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.
Build the mental picture, not memorization.
I just read a lesson on **Storage Engines — B-tree vs LSM-tree** (Databases) and want to truly understand it. Explain Storage Engines — B-tree vs LSM-tree from first principles using ONE vivid real-world analogy and a visual mental model — draw it as ASCII art or a clear step-by-step diagram — with a concrete example using real numbers. Then ask me one question to check I got the mental picture, and wait for my reply. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
Socratic — adapts to where you're stuck.
Teach me **Storage Engines — B-tree vs LSM-tree** interactively. Ask me ONE guiding question at a time, wait for my answer, and adapt to my confusion — build the idea with me step by step instead of explaining it all at once. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
Active recall exposes what you missed.
Quiz me on **Storage Engines — B-tree vs LSM-tree** with 5 questions, easy to tricky, ONE at a time. Tell me if each answer is right; at the end, explain clearly what I got wrong and why. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
Intuition + hook + flashcards for long-term memory.
Help me remember **Storage Engines — B-tree vs LSM-tree** for the long term: give the one-sentence intuition, a memorable hook/mnemonic, a tiny worked example, and 3 active-recall flashcards (Q -> A). If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.