Knowledge Guide
HomeDatabasesIndexing & Storage

Storage Engines — B-tree vs LSM-tree

Two ways to lay bytes on disk

Under every index and table is a storage engine, and almost all of them are one of two families with opposite trade-offs: B-tree (update in place, read-optimized) and LSM-tree (append & compact, write-optimized). The choice decides whether a database is great at reads or at heavy writes.

B-tree (Postgres, InnoDB, most RDBMS)

The B+tree from the indexing lesson, persisted: a write finds the page and updates it in place (plus a WAL entry for durability). Reads are a few seeks (~O(log n)). Great for reads and transactional workloads; writes do random I/O and may split pages (write amplification).

LSM-tree: writes append to an in-memory memtable, flush to sorted SSTables on disk, and background compaction merges them; reads check the memtable plus SSTables
LSM-tree: writes append to an in-memory memtable, flush to sorted SSTables on disk, and background compaction merges them; reads check the memtable plus SSTables

LSM-tree (Cassandra, RocksDB, LevelDB, Bigtable)

Optimized for write throughput:

  1. A write appends to an in-memory memtable (sorted) + a WAL — sequential, very fast.
  2. When the memtable fills, it's flushed to an immutable sorted file on disk (SSTable).
  3. Background compaction merges SSTables, discarding overwritten/deleted keys.

Reads are the cost: a key may be in the memtable or any SSTable, so a read may check several — mitigated by bloom filters (skip SSTables that definitely don't hold the key) and the sorted layout.

B-tree vs LSM — the trade

B-treeLSM-tree
Writesupdate-in-place, random I/Oappend-only, sequential — faster, higher throughput
Readsfew seeks, predictablemay merge several SSTables (bloom filters help)
Write amplificationpage splits + WALcompaction rewrites data repeatedly
Best forread-heavy, transactional (OLTP)write-heavy ingest, time-series, logs

Takeaways


Re-authored for this guide; LSM-tree diagram hand-authored as SVG. Follows DDIA ch. 3 and the RocksDB/ LevelDB design docs. See also: How Indexes Work (B+tree), Bloom Filters (System Design).

🤖 Don't fully get this? Learn it with Claude

Stuck on Storage Engines — B-tree vs LSM-tree? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.

🎨 Explain it visually

Build the mental picture, not memorization.

I just read a lesson on **Storage Engines — B-tree vs LSM-tree** (Databases) and want to truly understand it. Explain Storage Engines — B-tree vs LSM-tree from first principles using ONE vivid real-world analogy and a visual mental model — draw it as ASCII art or a clear step-by-step diagram — with a concrete example using real numbers. Then ask me one question to check I got the mental picture, and wait for my reply. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🤔 Walk me through it (interactive)

Socratic — adapts to where you're stuck.

Teach me **Storage Engines — B-tree vs LSM-tree** interactively. Ask me ONE guiding question at a time, wait for my answer, and adapt to my confusion — build the idea with me step by step instead of explaining it all at once. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🧪 Quiz me & fix my gaps

Active recall exposes what you missed.

Quiz me on **Storage Engines — B-tree vs LSM-tree** with 5 questions, easy to tricky, ONE at a time. Tell me if each answer is right; at the end, explain clearly what I got wrong and why. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🧠 Make it stick

Intuition + hook + flashcards for long-term memory.

Help me remember **Storage Engines — B-tree vs LSM-tree** for the long term: give the one-sentence intuition, a memorable hook/mnemonic, a tiny worked example, and 3 active-recall flashcards (Q -> A). If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

📝 My notes