Knowledge Guide
HomeDSAAdvanced Patterns

medium Top K Frequent Words

Problem Statement

Given an array of strings words and an integer k, return the k most frequent strings.

The answer should be sorted by the frequency of the words from highest to lowest. If multiple words have the same frequency, sort them in lexicographical order.

Examples

Example 1

Example 2

Example 3

Constraints:

Try it yourself

Try solving this question here:

✅ Solution Top K Frequent Words

Problem Statement

Given an array of strings words and an integer k, return the k most frequent strings.

The answer should be sorted by the frequency of the words from highest to lowest. If multiple words have the same frequency, sort them in lexicographical order.

Examples

Example 1

  • Input: words = ["apple", "banana", "apple", "orange", "banana", "apple"], k = 2
  • Expected Output: ["apple", "banana"]
  • Justification: "apple" appears 3 times, "banana" appears 2 times, and "orange" appears once. Since k is 2, we return the two most frequent words: "apple" and "banana".

Example 2

  • Input: words = ["dog", "cat", "mouse", "cat", "dog", "dog"], k = 1
  • Expected Output: ["dog"]
  • Justification: "dog" appears 3 times, "cat" appears 2 times, and "mouse" appears once. Since k is 1, we return the most frequent word: "dog".

Example 3

  • Input: words = ["hello", "world", "hello", "coding", "hello", "world"], k = 3
  • Expected Output: ["hello", "world", "coding"]
  • Justification: "hello" appears 3 times, "world" appears 2 times, and "coding" appears once. Since k is 3, we return the three most frequent words: "hello", "world", and "coding".

Constraints:

  • 1 <= words.length <= 500
  • 1 <= words[i].length <= 10
  • words[i] consists of lowercase English letters.
  • k is in the range [1, The number of unique words[i]]

Solution

To solve this problem, we use a trie data structure to efficiently handle the words' frequency and lexicographical order. The algorithm first counts the frequency of each word using a hashmap. Then, we use an array of trie nodes (buckets) to store words based on their frequency. Each bucket corresponds to a frequency count, and the words in each bucket are stored in a trie to maintain lexicographical order. This helps in grouping words by frequency and allows efficient retrieval of words in lexicographical order. After that, we traverse the trie nodes from the highest to lowest frequency and extract words.

This approach is efficient compared to sorting or using heaps because it eliminates the need for repeated comparisons during sorting. The trie structure ensures that words are naturally sorted lexicographically, and the bucket array allows us to directly access words based on their frequency. This reduces the time complexity significantly.

Step-by-Step Algorithm

  1. Initialize Variables:

    • Store the value of k, the number of top frequent words to find.
    • Initialize result as an empty list to store the top K frequent words.
    • Create an array bucket of TrieNode to hold Trie nodes for different frequencies.
    • Initialize a map count to count the frequency of each word.
  2. Count Word Frequencies:

    • For each word in words:
      • Increment the count of the word in the count map. This map will store the frequency of each word.
  3. Populate Buckets with Words:

    • Loop through each entry (word, frequency) in the hashmap count:
      • If bucket[frequency] is null, create a new trie node at bucket[frequency].
      • Add the word to the trie node at bucket[frequency]. This helps in grouping words by frequency and allows efficient retrieval of words in lexicographical order.
  4. Add Word to Trie:

    • Define a method addWord to add a word to the trie:
      • Initialize a pointer current to the root of the trie.
      • Loop through each character c in the word:
        • If current.children[c - 'a'] is null, create a new trie node at current.children[c - 'a'].
        • Move the pointer current to current.children[c - 'a'].
      • Mark the end of the word by setting current.isEnd to true.
  5. Collect Words from Trie:

    • Define a method getWords to retrieve words from the trie in lexicographical order:
      • If k is 0, return to stop further processing.
      • If root.isEnd is true, add the current prefix to the result list and decrement k.
      • Loop through each index i from 0 to 25:
        • If root.children[i] is not null, recursively call getWords with the updated prefix.
  6. Traverse Buckets:

    • Loop through the bucket array from the highest frequency to the lowest (i.e., from n to 1):
      • If bucket[i] is not null, call the getWords method to collect words from the trie node at bucket[i].
      • If k becomes 0, break out of the loop.
  7. Return Result:

    • Return the result list containing the top k frequent words.

Algorithm Walkthrough

Using the input words = ["apple", "banana", "apple", "orange", "banana", "apple"], k = 2:

  1. Count Frequencies:

    • Create a frequency dictionary:
      • {"apple": 3, "banana": 2, "orange": 1}
  2. Create Buckets:

    • Create an array of trie nodes (buckets) of size 7 (length of words array + 1):
      • [null, null, null, null, null, null, null]
  3. Populate Buckets:

    • Insert "apple" (frequency 3) into bucket[3]:
      • TrieNode at bucket[3]: "apple"
    • Insert "banana" (frequency 2) into bucket[2]:
      • TrieNode at bucket[2]: "banana"
    • Insert "orange" (frequency 1) into bucket[1]:
      • TrieNode at bucket[1]: "orange"
  4. Collect Words:

    • Traverse buckets from highest (index 6) to lowest (index 1):
      • bucket[6] to bucket[4]: empty
      • bucket[3]: contains "apple"
        • Add "apple" to the result list.
        • Decrease k to 1.
      • bucket[2]: contains "banana"
        • Add "banana" to the result list.
        • Decrease k to 0.
      • bucket[1]: not traversed as k is already 0.
  5. Return Result:

    • Return the result list: ["apple", "banana"]

Code

java
import java.util.*;

// Trie node class
// class TrieNode {
//     TrieNode[] children; // Children nodes for each letter
//     boolean isEnd; // Flag to mark the end of a word

//     public TrieNode() {
//         children = new TrieNode[26]; // 26 letters in the alphabet
//         isEnd = false; // Initialize as not the end of a word
//     }
// }

public class Solution {

  private int k; // Number of top frequent words to find
  private List<String> result; // List to store top K frequent words

  // Method to get the top K frequent words
  public List<String> topKFrequent(String[] words, int k) {
    this.k = k;
    result = new ArrayList<>();
    int n = words.length;
    TrieNode[] bucket = new TrieNode[n + 1]; // Array to hold Trie nodes for different frequencies
    Map<String, Integer> count = new HashMap<>(); // Map to count word frequencies

    // Count the frequency of each word
    for (String word : words) {
      count.put(word, count.getOrDefault(word, 0) + 1);
    }

    // Add words to the appropriate trie bucket based on their frequency
    for (var entry : count.entrySet()) {
      int freq = entry.getValue();
      if (bucket[freq] == null) {
        bucket[freq] = new TrieNode(); // Create a new TrieNode if not present
      }
      addWord(bucket[freq], entry.getKey()); // Add the word to the TrieNode
    }

    // Traverse buckets from highest to lowest frequency to get top K words
    for (int i = n; i > 0; i--) {
      if (bucket[i] != null) {
        getWords(bucket[i], ""); // Get words from the TrieNode
      }
      if (this.k == 0) {
        break; // Stop if we have found enough words
      }
    }
    return result;
  }

  // Method to add a word to the trie
  private void addWord(TrieNode root, String word) {
    TrieNode current = root; // Start from the root of the Trie
    for (char c : word.toCharArray()) {
      // Traverse the Trie, create nodes if necessary
      if (current.children[c - 'a'] == null) {
        current.children[c - 'a'] = new TrieNode();
      }
      current = current.children[c - 'a']; // Move to the next node
    }
    current.isEnd = true; // Mark the end of the word
  }

  // Method to retrieve words from the trie in lexicographical order
  private void getWords(TrieNode root, String prefix) {
    if (k == 0) {
      return; // Stop if we have found enough words
    }
    if (root.isEnd) {
      k--;
      result.add(prefix); // Add the word to the result list
    }
    for (int i = 0; i < 26; i++) {
      // Recursively get words from child nodes
      if (root.children[i] != null) {
        getWords(root.children[i], prefix + (char) (i + 'a'));
      }
    }
  }

  // Main method for testing the solution
  public static void main(String[] args) {
    Solution solution = new Solution();
    System.out.println(
      solution.topKFrequent(
        new String[] {
          "apple",
          "banana",
          "apple",
          "orange",
          "banana",
          "apple",
        },
        2
      )
    ); // ["apple", "banana"]
    System.out.println(
      solution.topKFrequent(
        new String[] { "dog", "cat", "mouse", "cat", "dog", "dog" },
        1
      )
    ); // ["dog"]
    System.out.println(
      solution.topKFrequent(
        new String[] { "hello", "world", "hello", "coding", "hello", "world" },
        3
      )
    ); // ["hello", "world", "coding"]
  }
}

Complexity Analysis

Time Complexity

  1. Counting Frequencies: This step involves iterating through the array of words once and storing their frequencies in a hashmap. This operation has a time complexity of , where n is the number of words.
  2. Building Trie Buckets: For each word in the hashmap, we insert it into the corresponding trie bucket. In the worst case, this involves inserting all characters of all words into the trie, which results in a time complexity of , where l is the average length of the words.
  3. Retrieving Words: This step involves traversing the trie from the highest frequency bucket to the lowest. In the worst case, we might traverse all the nodes in the trie, which again has a time complexity of .

Thus, the overall time complexity is . Here, l <= 10, which behaves like a constant so the time complexity is .

Space Complexity

  1. Frequency HashMap: The hashmap requires space to store the frequency of each unique word.
  2. Trie Buckets: Each trie node can have up to 26 children. In the worst case, the space required for the trie nodes is proportional to the total number of characters in all the words, resulting in space.

Thus, the overall space complexity is . Here, l <= 10, which works as a constant so the space complexity is .

🤖 Don't fully get this? Learn it with Claude

Stuck on Top K Frequent Words? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.

🪜 Hint ladder (no spoilers)

Progressively stronger hints — you still solve it.

I'm working on the problem **Top K Frequent Words** (DSA). Give me a HINT LADDER: start with the tiniest nudge, then wait. Only reveal the next, stronger hint when I ask. Do NOT show the full solution unless I type 'show solution'. Keep me doing the thinking. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🎨 Explain the approach visually

See the technique, not just code.

Explain the optimal approach to **Top K Frequent Words** with a VISUAL walkthrough: trace it on a small concrete example using ASCII art / a step-by-step diagram, narrate what changes each step, then give time & space complexity with a one-line derivation. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🔍 Review my solution

Catch bugs, edge cases, sub-optimality.

I'll paste my solution to **Top K Frequent Words**. Review it for correctness, missed edge cases, and time/space complexity, then coach me toward the optimal — don't just rewrite it. Ask me to paste my code now. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.
🔁 Drill the pattern

Lock in recognition with look-alikes.

Give me 2 problems that use the SAME underlying pattern as **Top K Frequent Words**. For each, let me attempt first, then review my answer and name the trigger signal that reveals the pattern. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

📝 My notes