Knowledge Guide
HomeDSA

Hashing

Step 6 in the DSA path · 7 concepts · 14 problems

0 / 21 complete

📘 Learn Hashing from zero

Hashing trades memory for speed: you store data in a hash map or hash set so that lookups, inserts, and membership checks run in O(1) average time instead of the O(n) you would pay scanning a list. The core trick is to compute a key from each item and use it as an index into a table. The trigger to reach for hashing is any time you find yourself asking "have I seen this before?", "how many times does X appear?", or "which items share some property?" — especially when a naive solution needs a nested loop comparing every pair. Here we trace Group Anagrams: given strs = ["eat","tea","tan","ate","nat","bat"], group strings that are rearrangements of each other. The insight: two words are anagrams iff their sorted letters are identical, so the sorted string becomes a hash key that funnels anagrams into the same bucket.

✨ Added by the guide to build intuition — not from the source course.

🏗️ Visual walkthrough — trace it step by step

Step 1 — Set up the map and the key idea
input:   ["eat", "tea", "tan", "ate", "nat", "bat"]
idx:        0      1      2      3      4      5

key(word) = sorted letters of word
  "eat" -> "aet"      "tan" -> "ant"
  "tea" -> "aet"      "bat" -> "abt"

map (key -> list of words):  {  }   <- empty

We create an empty hash map map whose key is the sorted-letter signature and whose value is the list of original words sharing that signature. Why it is safe: sorting letters is a canonical form — two words are anagrams if and only if their sorted forms are byte-for-byte equal, so the key uniquely identifies an anagram group without comparing words pairwise.

Step 2 — Process "eat" (idx 0)
word = "eat"   key = sort("eat") = "aet"

"aet" in map?  NO  -> create new bucket

map:
  +-------+-----------------+
  | key   | words           |
  +-------+-----------------+
  | "aet" | ["eat"]         |   <- new
  +-------+-----------------+

The key "aet" is not present, so we create a fresh bucket and append "eat". Why it is safe: a hash map gives us O(1) average lookup for "does this key exist?" — we never scan existing buckets, we jump straight to the slot the key hashes to.

Step 3 — Process "tea" (idx 1) — first collision into a bucket
word = "tea"   key = sort("tea") = "aet"

"aet" in map?  YES  -> append to existing bucket

map:
  +-------+--------------------+
  | key   | words              |
  +-------+--------------------+
  | "aet" | ["eat", "tea"]    |   <- grew
  +-------+--------------------+

"tea" sorts to "aet", the same key as "eat", so it lands in the same bucket. Why it is safe: identical sorted signatures guarantee the two words use exactly the same multiset of letters, so they are genuine anagrams — grouping them is correct, not a hash-collision accident.

Step 4 — Process "tan" (idx 2) — a brand-new group
word = "tan"   key = sort("tan") = "ant"

"ant" in map?  NO  -> create new bucket

map:
  +-------+--------------------+
  | key   | words              |
  +-------+--------------------+
  | "aet" | ["eat", "tea"]    |
  | "ant" | ["tan"]           |   <- new
  +-------+--------------------+

"tan" sorts to "ant", a key we have not seen, so it starts its own bucket. Why it is safe: a different signature means a different letter multiset, so "tan" cannot belong with the "aet" group — keeping it separate is exactly what we want.

Step 5 — Process "ate" (idx 3) — joins the first group
word = "ate"   key = sort("ate") = "aet"

"aet" in map?  YES  -> append

map:
  +-------+---------------------------+
  | key   | words                     |
  +-------+---------------------------+
  | "aet" | ["eat", "tea", "ate"]   |   <- grew
  | "ant" | ["tan"]                  |
  +-------+---------------------------+

"ate" also sorts to "aet" and joins eat/tea. Why it is safe: notice we did O(1) work to place a third anagram — a nested-loop approach would have compared "ate" against every prior word; the hash key collapses all of that into a single lookup.

Step 6 — Process "nat" (idx 4) — joins the second group
word = "nat"   key = sort("nat") = "ant"

"ant" in map?  YES  -> append

map:
  +-------+---------------------------+
  | key   | words                     |
  +-------+---------------------------+
  | "aet" | ["eat", "tea", "ate"]   |
  | "ant" | ["tan", "nat"]           |   <- grew
  +-------+---------------------------+

"nat" sorts to "ant" and merges with "tan". Why it is safe: sorting is order-independent, so tan and nat — which differ only by letter order — collapse to the identical canonical key and are correctly grouped.

Step 7 — Process "bat" (idx 5) — third distinct group
word = "bat"   key = sort("bat") = "abt"

"abt" in map?  NO  -> create new bucket

map:
  +-------+---------------------------+
  | key   | words                     |
  +-------+---------------------------+
  | "aet" | ["eat", "tea", "ate"]   |
  | "ant" | ["tan", "nat"]           |
  | "abt" | ["bat"]                  |   <- new
  +-------+---------------------------+

"bat" sorts to "abt", which contains a b no other word has, so it forms a singleton bucket. Why it is safe: the presence of a unique letter guarantees a unique signature, so "bat" is provably in no existing group.

Step 8 — Return the bucket values (final answer)
all 6 words processed. Return map.values():

  [ ["eat", "tea", "ate"],     <- key "aet"
    ["tan", "nat"],            <- key "ant"
    ["bat"] ]                  <- key "abt"

Time:  O(n * k log k)   n = #words, k = max word length
Space: O(n * k)         all words stored once in buckets

We discard the keys and return just the lists of grouped words. Why it is safe / why hashing wins: each word was placed in exactly one bucket via a single O(1) average map operation, so the whole grouping is O(n·k·log k) (the log k is the per-word sort), versus the O(n²·k) of pairwise comparison. The hash map turned a "compare everything to everything" problem into a "compute a key and drop it in a slot" problem.

🎯 Guided practice

  1. Easy — Contains Duplicate II. Given nums and k, return true if there are two equal values whose indices differ by at most k.

    Reasoning: The brute force compares every pair within a window — O(n·k). Better: keep a HashMap value → last index seen. (1) Scan left to right. (2) At index i, if nums[i] is already in the map and i − map[nums[i]] ≤ k, return true. (3) Otherwise (or after the check) write map[nums[i]] = i — always overwrite, since the most recent index gives the smallest future gap. (4) End of scan → false. Core pattern: a hash map of seen-value → metadata (here, index) lets you check a relationship in O(1) per element, turning O(n·k) into O(n) time, O(min(n,k)) space (the sliding-window-set variant bounds the map at k+1 entries; the plain last-index map above is O(n) space unless you evict).

  2. Medium — Group Anagrams. Group words that are anagrams of each other, e.g. ["eat","tea","tan","ate","nat","bat"][["eat","tea","ate"],["tan","nat"],["bat"]].

    Reasoning: Anagrams share a canonical signature — the same letters regardless of order. (1) Pick a signature that is identical for all anagrams of a word: the sorted string ("eat"→"aet", "tea"→"aet") — or, faster, a 26-length count tuple of letter frequencies used as the key. (2) Use a HashMap signature → list of words. (3) For each word, compute its signature and append the word to that bucket. (4) Return the map's values. Core pattern: "group by a property" = map a derived key to a list; choosing a canonical key that is equal exactly when items belong together is the whole game. Sorting keys gives O(n·m·log m); the count-tuple key gives O(n·m) for n words of average length m.

✨ Added by the guide — work these before the full problem set.

Lessons in this topic