Hashing
Step 6 in the DSA path · 7 concepts · 14 problems
📘 Learn Hashing from zero
Hashing trades memory for speed: you store data in a hash map or hash set so that lookups, inserts, and membership checks run in O(1) average time instead of the O(n) you would pay scanning a list. The core trick is to compute a key from each item and use it as an index into a table. The trigger to reach for hashing is any time you find yourself asking "have I seen this before?", "how many times does X appear?", or "which items share some property?" — especially when a naive solution needs a nested loop comparing every pair. Here we trace Group Anagrams: given strs = ["eat","tea","tan","ate","nat","bat"], group strings that are rearrangements of each other. The insight: two words are anagrams iff their sorted letters are identical, so the sorted string becomes a hash key that funnels anagrams into the same bucket.
✨ Added by the guide to build intuition — not from the source course.
🏗️ Visual walkthrough — trace it step by step
input: ["eat", "tea", "tan", "ate", "nat", "bat"]
idx: 0 1 2 3 4 5
key(word) = sorted letters of word
"eat" -> "aet" "tan" -> "ant"
"tea" -> "aet" "bat" -> "abt"
map (key -> list of words): { } <- empty
We create an empty hash map map whose key is the sorted-letter signature and whose value is the list of original words sharing that signature. Why it is safe: sorting letters is a canonical form — two words are anagrams if and only if their sorted forms are byte-for-byte equal, so the key uniquely identifies an anagram group without comparing words pairwise.
word = "eat" key = sort("eat") = "aet"
"aet" in map? NO -> create new bucket
map:
+-------+-----------------+
| key | words |
+-------+-----------------+
| "aet" | ["eat"] | <- new
+-------+-----------------+
The key "aet" is not present, so we create a fresh bucket and append "eat". Why it is safe: a hash map gives us O(1) average lookup for "does this key exist?" — we never scan existing buckets, we jump straight to the slot the key hashes to.
word = "tea" key = sort("tea") = "aet"
"aet" in map? YES -> append to existing bucket
map:
+-------+--------------------+
| key | words |
+-------+--------------------+
| "aet" | ["eat", "tea"] | <- grew
+-------+--------------------+
"tea" sorts to "aet", the same key as "eat", so it lands in the same bucket. Why it is safe: identical sorted signatures guarantee the two words use exactly the same multiset of letters, so they are genuine anagrams — grouping them is correct, not a hash-collision accident.
word = "tan" key = sort("tan") = "ant"
"ant" in map? NO -> create new bucket
map:
+-------+--------------------+
| key | words |
+-------+--------------------+
| "aet" | ["eat", "tea"] |
| "ant" | ["tan"] | <- new
+-------+--------------------+
"tan" sorts to "ant", a key we have not seen, so it starts its own bucket. Why it is safe: a different signature means a different letter multiset, so "tan" cannot belong with the "aet" group — keeping it separate is exactly what we want.
word = "ate" key = sort("ate") = "aet"
"aet" in map? YES -> append
map:
+-------+---------------------------+
| key | words |
+-------+---------------------------+
| "aet" | ["eat", "tea", "ate"] | <- grew
| "ant" | ["tan"] |
+-------+---------------------------+
"ate" also sorts to "aet" and joins eat/tea. Why it is safe: notice we did O(1) work to place a third anagram — a nested-loop approach would have compared "ate" against every prior word; the hash key collapses all of that into a single lookup.
word = "nat" key = sort("nat") = "ant"
"ant" in map? YES -> append
map:
+-------+---------------------------+
| key | words |
+-------+---------------------------+
| "aet" | ["eat", "tea", "ate"] |
| "ant" | ["tan", "nat"] | <- grew
+-------+---------------------------+
"nat" sorts to "ant" and merges with "tan". Why it is safe: sorting is order-independent, so tan and nat — which differ only by letter order — collapse to the identical canonical key and are correctly grouped.
word = "bat" key = sort("bat") = "abt"
"abt" in map? NO -> create new bucket
map:
+-------+---------------------------+
| key | words |
+-------+---------------------------+
| "aet" | ["eat", "tea", "ate"] |
| "ant" | ["tan", "nat"] |
| "abt" | ["bat"] | <- new
+-------+---------------------------+
"bat" sorts to "abt", which contains a b no other word has, so it forms a singleton bucket. Why it is safe: the presence of a unique letter guarantees a unique signature, so "bat" is provably in no existing group.
all 6 words processed. Return map.values():
[ ["eat", "tea", "ate"], <- key "aet"
["tan", "nat"], <- key "ant"
["bat"] ] <- key "abt"
Time: O(n * k log k) n = #words, k = max word length
Space: O(n * k) all words stored once in buckets
We discard the keys and return just the lists of grouped words. Why it is safe / why hashing wins: each word was placed in exactly one bucket via a single O(1) average map operation, so the whole grouping is O(n·k·log k) (the log k is the per-word sort), versus the O(n²·k) of pairwise comparison. The hash map turned a "compare everything to everything" problem into a "compute a key and drop it in a slot" problem.
🎯 Guided practice
- Easy — Contains Duplicate II. Given
numsandk, return true if there are two equal values whose indices differ by at mostk.Reasoning: The brute force compares every pair within a window — O(n·k). Better: keep a HashMap
value → last index seen. (1) Scan left to right. (2) At indexi, ifnums[i]is already in the map andi − map[nums[i]] ≤ k, return true. (3) Otherwise (or after the check) writemap[nums[i]] = i— always overwrite, since the most recent index gives the smallest future gap. (4) End of scan → false. Core pattern: a hash map ofseen-value → metadata (here, index)lets you check a relationship in O(1) per element, turning O(n·k) into O(n) time, O(min(n,k)) space (the sliding-window-set variant bounds the map at k+1 entries; the plain last-index map above is O(n) space unless you evict). - Medium — Group Anagrams. Group words that are anagrams of each other, e.g.
["eat","tea","tan","ate","nat","bat"]→[["eat","tea","ate"],["tan","nat"],["bat"]].Reasoning: Anagrams share a canonical signature — the same letters regardless of order. (1) Pick a signature that is identical for all anagrams of a word: the sorted string (
"eat"→"aet","tea"→"aet") — or, faster, a 26-length count tuple of letter frequencies used as the key. (2) Use a HashMapsignature → list of words. (3) For each word, compute its signature and append the word to that bucket. (4) Return the map's values. Core pattern: "group by a property" = map a derived key to a list; choosing a canonical key that is equal exactly when items belong together is the whole game. Sorting keys gives O(n·m·log m); the count-tuple key gives O(n·m) for n words of average length m.
✨ Added by the guide — work these before the full problem set.
Lessons in this topic
- ○Introduction to Hash Tables
- ○Hashing, Collisions, Overflow, and Resizing in Hashtables
- ○Using Hashtable in Different Programming Languages
- ○Introduction to HashSets
- ○Using HashSets in Different Programming Languages
- ○Introduction to HashSets
- ○Using HashSets in Different Programming Languages
- ○easy Contains Duplicate II
- ○easy Word Pattern
- ○easy Problem 1 First Non-repeating Character
- ○easy Problem 2 Largest Unique Number
- ○easy Problem 3 Maximum Number of Balloons
- ○easy Problem 4 Longest Palindrome
- ○easy Problem 5 Ransom Note
- ○easy Problem 1 Counting Elements
- ○easy Problem 2 Jewels and Stones
- ○easy Problem 3 Unique Number of Occurrences
- ○medium Determine if Two Strings Are Close
- ○medium Group Anagrams
- ○medium Longest Consecutive Sequence
- ○medium Problem 4 Longest Substring Without Repeating Characters