medium Top K Frequent Words
Problem Statement
Given an array of strings words and an integer k, return the k most frequent strings.
The answer should be sorted by the frequency of the words from highest to lowest. If multiple words have the same frequency, sort them in lexicographical order.
Examples
Example 1
- Input: words = ["apple", "banana", "apple", "orange", "banana", "apple"], k =
2 - Expected Output: ["apple", "banana"]
- Justification: "apple" appears 3 times, "banana" appears 2 times, and "orange" appears once. Since k is 2, we return the two most frequent words: "apple" and "banana".
Example 2
- Input: words = ["dog", "cat", "mouse", "cat", "dog", "dog"], k =
1 - Expected Output: ["dog"]
- Justification: "dog" appears 3 times, "cat" appears 2 times, and "mouse" appears once. Since k is 1, we return the most frequent word: "dog".
Example 3
- Input: words = ["hello", "world", "hello", "coding", "hello", "world"], k =
3 - Expected Output: ["hello", "world", "coding"]
- Justification: "hello" appears 3 times, "world" appears 2 times, and "coding" appears once. Since k is 3, we return the three most frequent words: "hello", "world", and "coding".
Constraints:
- 1 <= words.length <= 500
- 1 <= words[i].length <= 10
words[i]consists of lowercase English letters.kis in the range [1, The number of unique words[i]]
Try it yourself
Try solving this question here:
✅ Solution Top K Frequent Words
Problem Statement
Given an array of strings words and an integer k, return the k most frequent strings.
The answer should be sorted by the frequency of the words from highest to lowest. If multiple words have the same frequency, sort them in lexicographical order.
Examples
Example 1
- Input: words = ["apple", "banana", "apple", "orange", "banana", "apple"], k =
2 - Expected Output: ["apple", "banana"]
- Justification: "apple" appears 3 times, "banana" appears 2 times, and "orange" appears once. Since k is 2, we return the two most frequent words: "apple" and "banana".
Example 2
- Input: words = ["dog", "cat", "mouse", "cat", "dog", "dog"], k =
1 - Expected Output: ["dog"]
- Justification: "dog" appears 3 times, "cat" appears 2 times, and "mouse" appears once. Since k is 1, we return the most frequent word: "dog".
Example 3
- Input: words = ["hello", "world", "hello", "coding", "hello", "world"], k =
3 - Expected Output: ["hello", "world", "coding"]
- Justification: "hello" appears 3 times, "world" appears 2 times, and "coding" appears once. Since k is 3, we return the three most frequent words: "hello", "world", and "coding".
Constraints:
- 1 <= words.length <= 500
- 1 <= words[i].length <= 10
words[i]consists of lowercase English letters.kis in the range [1, The number of unique words[i]]
Solution
To solve this problem, we use a trie data structure to efficiently handle the words' frequency and lexicographical order. The algorithm first counts the frequency of each word using a hashmap. Then, we use an array of trie nodes (buckets) to store words based on their frequency. Each bucket corresponds to a frequency count, and the words in each bucket are stored in a trie to maintain lexicographical order. This helps in grouping words by frequency and allows efficient retrieval of words in lexicographical order. After that, we traverse the trie nodes from the highest to lowest frequency and extract words.
This approach is efficient compared to sorting or using heaps because it eliminates the need for repeated comparisons during sorting. The trie structure ensures that words are naturally sorted lexicographically, and the bucket array allows us to directly access words based on their frequency. This reduces the time complexity significantly.
Step-by-Step Algorithm
-
Initialize Variables:
- Store the value of
k, the number of top frequent words to find. - Initialize
resultas an empty list to store the top K frequent words. - Create an array
bucketof TrieNode to hold Trie nodes for different frequencies. - Initialize a map
countto count the frequency of each word.
- Store the value of
-
Count Word Frequencies:
- For each
wordinwords:- Increment the count of the word in the
countmap. This map will store the frequency of each word.
- Increment the count of the word in the
- For each
-
Populate Buckets with Words:
- Loop through each entry (word, frequency) in the hashmap
count:- If
bucket[frequency]isnull, create a new trie node atbucket[frequency]. - Add the word to the trie node at
bucket[frequency]. This helps in grouping words by frequency and allows efficient retrieval of words in lexicographical order.
- If
- Loop through each entry (word, frequency) in the hashmap
-
Add Word to Trie:
- Define a method
addWordto add a word to the trie:- Initialize a pointer
currentto the root of the trie. - Loop through each character
cin the word:- If
current.children[c - 'a']isnull, create a new trie node atcurrent.children[c - 'a']. - Move the pointer
currenttocurrent.children[c - 'a'].
- If
- Mark the end of the word by setting
current.isEndtotrue.
- Initialize a pointer
- Define a method
-
Collect Words from Trie:
- Define a method
getWordsto retrieve words from the trie in lexicographical order:- If
kis0, return to stop further processing. - If
root.isEndistrue, add the current prefix to the result list and decrementk. - Loop through each index
ifrom0to25:- If
root.children[i]is notnull, recursively callgetWordswith the updated prefix.
- If
- If
- Define a method
-
Traverse Buckets:
- Loop through the
bucketarray from the highest frequency to the lowest (i.e., fromnto1):- If
bucket[i]is notnull, call thegetWordsmethod to collect words from the trie node atbucket[i]. - If
kbecomes0, break out of the loop.
- If
- Loop through the
-
Return Result:
- Return the
resultlist containing the topkfrequent words.
- Return the
Algorithm Walkthrough
Using the input words = ["apple", "banana", "apple", "orange", "banana", "apple"], k = 2:
-
Count Frequencies:
- Create a frequency dictionary:
- {"apple": 3, "banana": 2, "orange": 1}
- Create a frequency dictionary:
-
Create Buckets:
- Create an array of trie nodes (buckets) of size 7 (length of words array + 1):
- [null, null, null, null, null, null, null]
- Create an array of trie nodes (buckets) of size 7 (length of words array + 1):
-
Populate Buckets:
- Insert "apple" (frequency 3) into bucket[3]:
- TrieNode at bucket[3]: "apple"
- Insert "banana" (frequency 2) into bucket[2]:
- TrieNode at bucket[2]: "banana"
- Insert "orange" (frequency 1) into bucket[1]:
- TrieNode at bucket[1]: "orange"
- Insert "apple" (frequency 3) into bucket[3]:
-
Collect Words:
- Traverse buckets from highest (index 6) to lowest (index 1):
- bucket[6] to bucket[4]: empty
- bucket[3]: contains "apple"
- Add "apple" to the result list.
- Decrease k to 1.
- bucket[2]: contains "banana"
- Add "banana" to the result list.
- Decrease k to 0.
- bucket[1]: not traversed as k is already 0.
- Traverse buckets from highest (index 6) to lowest (index 1):
-
Return Result:
- Return the result list: ["apple", "banana"]
Code
import java.util.*;
// Trie node class
// class TrieNode {
// TrieNode[] children; // Children nodes for each letter
// boolean isEnd; // Flag to mark the end of a word
// public TrieNode() {
// children = new TrieNode[26]; // 26 letters in the alphabet
// isEnd = false; // Initialize as not the end of a word
// }
// }
public class Solution {
private int k; // Number of top frequent words to find
private List<String> result; // List to store top K frequent words
// Method to get the top K frequent words
public List<String> topKFrequent(String[] words, int k) {
this.k = k;
result = new ArrayList<>();
int n = words.length;
TrieNode[] bucket = new TrieNode[n + 1]; // Array to hold Trie nodes for different frequencies
Map<String, Integer> count = new HashMap<>(); // Map to count word frequencies
// Count the frequency of each word
for (String word : words) {
count.put(word, count.getOrDefault(word, 0) + 1);
}
// Add words to the appropriate trie bucket based on their frequency
for (var entry : count.entrySet()) {
int freq = entry.getValue();
if (bucket[freq] == null) {
bucket[freq] = new TrieNode(); // Create a new TrieNode if not present
}
addWord(bucket[freq], entry.getKey()); // Add the word to the TrieNode
}
// Traverse buckets from highest to lowest frequency to get top K words
for (int i = n; i > 0; i--) {
if (bucket[i] != null) {
getWords(bucket[i], ""); // Get words from the TrieNode
}
if (this.k == 0) {
break; // Stop if we have found enough words
}
}
return result;
}
// Method to add a word to the trie
private void addWord(TrieNode root, String word) {
TrieNode current = root; // Start from the root of the Trie
for (char c : word.toCharArray()) {
// Traverse the Trie, create nodes if necessary
if (current.children[c - 'a'] == null) {
current.children[c - 'a'] = new TrieNode();
}
current = current.children[c - 'a']; // Move to the next node
}
current.isEnd = true; // Mark the end of the word
}
// Method to retrieve words from the trie in lexicographical order
private void getWords(TrieNode root, String prefix) {
if (k == 0) {
return; // Stop if we have found enough words
}
if (root.isEnd) {
k--;
result.add(prefix); // Add the word to the result list
}
for (int i = 0; i < 26; i++) {
// Recursively get words from child nodes
if (root.children[i] != null) {
getWords(root.children[i], prefix + (char) (i + 'a'));
}
}
}
// Main method for testing the solution
public static void main(String[] args) {
Solution solution = new Solution();
System.out.println(
solution.topKFrequent(
new String[] {
"apple",
"banana",
"apple",
"orange",
"banana",
"apple",
},
2
)
); // ["apple", "banana"]
System.out.println(
solution.topKFrequent(
new String[] { "dog", "cat", "mouse", "cat", "dog", "dog" },
1
)
); // ["dog"]
System.out.println(
solution.topKFrequent(
new String[] { "hello", "world", "hello", "coding", "hello", "world" },
3
)
); // ["hello", "world", "coding"]
}
}
Complexity Analysis
Time Complexity
- Counting Frequencies: This step involves iterating through the array of words once and storing their frequencies in a hashmap. This operation has a time complexity of
, where n is the number of words. - Building Trie Buckets: For each word in the hashmap, we insert it into the corresponding trie bucket. In the worst case, this involves inserting all characters of all words into the trie, which results in a time complexity of
, where l is the average length of the words. - Retrieving Words: This step involves traversing the trie from the highest frequency bucket to the lowest. In the worst case, we might traverse all the nodes in the trie, which again has a time complexity of
.
Thus, the overall time complexity is l <= 10, which behaves like a constant so the time complexity is
Space Complexity
- Frequency HashMap: The hashmap requires
space to store the frequency of each unique word. - Trie Buckets: Each trie node can have up to 26 children. In the worst case, the space required for the trie nodes is proportional to the total number of characters in all the words, resulting in
space.
Thus, the overall space complexity is l <= 10, which works as a constant so the space complexity is
🤖 Don't fully get this? Learn it with Claude
Stuck on Top K Frequent Words? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.
Progressively stronger hints — you still solve it.
I'm working on the problem **Top K Frequent Words** (DSA). Give me a HINT LADDER: start with the tiniest nudge, then wait. Only reveal the next, stronger hint when I ask. Do NOT show the full solution unless I type 'show solution'. Keep me doing the thinking. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
See the technique, not just code.
Explain the optimal approach to **Top K Frequent Words** with a VISUAL walkthrough: trace it on a small concrete example using ASCII art / a step-by-step diagram, narrate what changes each step, then give time & space complexity with a one-line derivation. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
Catch bugs, edge cases, sub-optimality.
I'll paste my solution to **Top K Frequent Words**. Review it for correctness, missed edge cases, and time/space complexity, then coach me toward the optimal — don't just rewrite it. Ask me to paste my code now. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
Lock in recognition with look-alikes.
Give me 2 problems that use the SAME underlying pattern as **Top K Frequent Words**. For each, let me attempt first, then review my answer and name the trigger signal that reveals the pattern. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.