Knowledge Guide
HomeDSA

Heap

Step 12 in the DSA path · 4 concepts · 9 problems

0 / 13 complete

📘 Learn Heap from zero

A heap is a binary tree kept in an array where every parent beats its children (max-heap: parent largest; min-heap: parent smallest), so the extreme element sits at the root and is readable in O(1), with push/pop in O(log n). Reach for a heap whenever you repeatedly need the smallest or largest of a changing set — top-K, merging K sorted lists, scheduling by priority, or running statistics over a stream. The canonical example is Find Median from a Data Stream: keep a max-heap for the lower half and a min-heap for the upper half so the median is always at the two roots.

✨ Added by the guide to build intuition — not from the source course.

🏗️ Visual walkthrough — trace it step by step

Step 1 — The two-heap invariant
lo = MAX-heap (lower half, root = largest of low)
hi = MIN-heap (upper half, root = smallest of high)

INVARIANTS we must always restore:
  (1) every value in lo  <=  every value in hi
  (2) size(lo) == size(hi)   OR   size(lo) == size(hi)+1

      lo (max-heap)        hi (min-heap)
      [   empty   ]        [   empty   ]
         root=-                root=-

Median rule:
  odd  total -> lo.root
  even total -> (lo.root + hi.root) / 2

We split the data so the two middle elements live at the two roots. Because lo holds the smaller half (largest on top) and hi holds the larger half (smallest on top), the median is always one peek (or two) away — no sorting needed. Stream to process: 5, 15, 1, 3.

Step 2 — Insert 5
add 5: lo empty -> push to lo, then rebalance.

  lo (max): [5]          hi (min): []
             ^root=5                 root=-

sizes: lo=1, hi=0  -> invariant (2) holds (lo = hi+1)
roots: lo.root=5  (no hi to compare) -> invariant (1) holds

total=1 (odd) -> median = lo.root = 5

First element goes to lo. With one element the lone value is trivially the median. The push is O(log n) and the peek that gives the median is O(1) — this is the whole point of using heaps here.

Step 3 — Insert 15 (push, then cross-balance)
15 <= lo.root(5)? no -> belongs in upper half, push to hi.

  lo (max): [5]          hi (min): [15]
             ^root=5                 ^root=15

Check invariant (1): lo.root(5) <= hi.root(15)  OK
Check invariant (2): lo=1, hi=1  -> equal, OK

total=2 (even) -> median = (5 + 15)/2 = 10

Each new value is routed by comparing to lo's root: anything not <= it goes to the upper heap. Because the roots already satisfy lo.root <= hi.root and sizes are equal, no rebalance is needed — the invariants are safe.

Step 4 — Insert 1 (push to lo)
1 <= lo.root(5) -> belongs in lower half, push to lo.

  lo (max): [5,1]        hi (min): [15]
            root=5                  ^root=15
            sift: 1<5, stays child

      5            
     /             
    1              

sizes: lo=2, hi=1  -> lo is ahead by 1, still OK (lo = hi+1)
roots: lo.root(5) <= hi.root(15)  OK

total=3 (odd) -> median = lo.root = 5

1 is small, so it sinks into lo; the max-heap sift keeps 5 on top. Sizes are lo=2, hi=1, which is the allowed "lo leads by one" state, so the single middle element is exactly lo's root. Safe — no rebalance.

Step 5 — Insert 3: wrong-heap landing forces a rebalance
3 <= lo.root(5) -> push to lo first.

  lo (max): [5,1,3]      hi (min): [15]
      5
     / \
    1   3      sizes: lo=3, hi=1  <-- VIOLATES invariant (2)!
               (lo leads by 2, allowed max is 1)

The naive push overfills lo. Whenever size(lo) > size(hi)+1 we must move lo's root across to hi — this is the rebalance step that keeps the two halves the same size.

Step 6 — Rebalance: pop lo.root, push it to hi
pop max from lo (=5), push 5 into hi.

  lo (max): [3,1]        hi (min): [5,15]
      3                       5
     /                       \
    1                         15

WHY this is safe:
 - 5 was the LARGEST of the low half, so it is <= everything
   already in hi -> invariant (1) still holds (3 <= 5).
 - sizes now lo=2, hi=2  -> invariant (2) holds.

total=4 (even) -> median = (lo.root + hi.root)/2
               = (3 + 5)/2 = 4

Moving lo's root is always legal because it was the boundary value — the biggest of the small half is the smallest thing that can join the big half. Each transfer is O(log n), and after it the two roots straddle the true median: sorted data is 1,3,5,15 → median (3+5)/2 = 4.

Step 7 — General insert algorithm (the reusable template)
function addNum(x):
   if lo empty OR x <= lo.root:  lo.push(x)
   else:                         hi.push(x)

   # rebalance sizes (lo may lead by at most 1)
   if size(lo) > size(hi)+1:  hi.push(lo.popMax())
   elif size(hi) > size(lo):  lo.push(hi.popMin())

function findMedian():
   if size(lo) > size(hi):  return lo.root
   else:                    return (lo.root + hi.root)/2

Cost: addNum O(log n), findMedian O(1)

The whole pattern reduces to: route by comparison, then restore the size invariant by shuffling one root across. Replaying 5,15,1,3 through this gives running medians 5 → 10 → 5 → 4, exactly matching our trace — the heap roots are the streaming median for free.

🎯 Guided practice

Problem 1 (easy) — "Take Gifts From the Richest Pile" (greedy-with-a-heap). Given gift piles [25, 64, 9, 4, 100], repeat 4 times: take the richest pile, replace it with floor(sqrt(pile)), put it back. Return the total remaining.

  1. Recognize the signal: "repeatedly take the largest, modify, reinsert" → priority queue. We need the max each round over changing data → a max-heap.
  2. Build: heapify all gifts into a max-heap in O(n) (in most languages: negate values into a min-heap). Heap = {100,64,25,9,4}.
  3. Round 1: pop 100, push floor(sqrt(100))=10. Round 2: pop 64, push 8. Round 3: pop 25, push 5. Round 4: heap is {10,9,8,5,4}, pop 10, push 3.
  4. Result: remaining heap holds {9,8,5,4,3} → sum = 29. Each round is one pop + one push = O(log n); total O(n + k log n), space O(n). Lesson: the heap silently re-orders after every mutation, so the next "max" is always one peek away.

Problem 2 (medium) — Find the Median of a Number Stream (Two Heaps). Numbers arrive one at a time; after each, report the median. Stream: 5, 2, 8, 1.

  1. Recognize the signal: "running median over a stream" → the Two Heaps pattern. Keep a max-heap lo for the smaller half and a min-heap hi for the larger half, with the median at their seam.
  2. Invariant: every element in lo ≤ every element in hi, and 0 ≤ size(lo) - size(hi) ≤ 1. To add x: push x onto lo, move lo's max to hi, then if hi is now larger than lo, move hi's min back to lo. This keeps both halves correctly split at the seam.
  3. Trace (heaps shown as multisets). Add 5: lo={5}, hi={}. Median = 5. Add 2: lo={2}, hi={5}. Median = (2+5)/2 = 3.5. Add 8: lo={2,5}, hi={8}. Median = top of lo = 5. Add 1: lo={1,2}, hi={5,8}. Median = (2+5)/2 = 3.5.
  4. Read the median in O(1): if lo is larger, it's lo's root; if sizes are equal, average the two roots. Insert is O(log n) (rebalancing is a constant number of heap ops); space O(n). Lesson: when one extreme isn't enough but the middle is, two opposing heaps turn an O(n) median scan into O(log n) per element.

✨ Added by the guide — work these before the full problem set.

Lessons in this topic