hard Minimize Malware Spread

Problem Statement

You are given a network of n nodes, represented as an n x n adjacency matrix called graph. In this graph, graph[i][j] = 1 indicates that node i is directly connected to node j. If graph[i][j] = 0, there is no direct connection between them.

Some nodes in this network are initially infected by malware, and these nodes are listed in the array initial. When a node is infected, the malware spreads to all nodes that are directly connected to it. This spread continues until no more nodes can be infected.

Suppose M(initial) is the final number of nodes infected with malware in the entire network after the spread of malware stops. We will remove exactly one node from the initial.

Return the node that, if removed, would minimize M(initial). If there is more than one possible node that can be removed to achieve this goal, return the node with the smallest index.

Note: Even if a node is removed from the initial list, it might still become infected later due to the spread of malware from other connected nodes.

Example

Example 1:

Input: graph:

[[1, 0, 0],  
 [0, 1, 1],  
 [0, 1, 1]]

initial: [0, 1]

Expected Output: 1
Justification: In this example, 1 is connected with 2. If we remove 0, there will be 2 infected nodes which are 1 and 2. If we remove 1, there will be only 1 infected node, which is 0.

Example 2:

Input: graph:

[[1, 1, 0, 0],  
 [1, 1, 0, 0],  
 [0, 0, 1, 1],  
 [0, 0, 1, 1]]

initial: [1, 2]

Expected Output: 1

Justification: Here, the graph contains 2 components. In the first component 0 and 1 are connected and in the 2nd component 2 and 3 are connected. So, if we remove either 1 or 2, the number of infected nodes will be reduced from 4 to 2. However, since node 1 has a smaller index, we choose node 1.

Example 3:

Input: graph:

[[1, 0, 0, 1],  
 [0, 1, 1, 0],  
 [0, 1, 1, 1],  
 [1, 0, 1, 1]]

initial: [0, 2]

Expected Output: 0

Justification: Initially, all 4 nodes are infected with Malware. If we remove either 0 or 2, still, all nodes remain infected. For example, if we remove 0, then node 2 will be infected as it is connected to 3 and 3 is connected to the 0. Also, the same goes for the 0. So, we remove the element with smallest index, which is 0.

Try it yourself

Try solving this question here:

✅ Solution Minimize Malware Spread

Problem Statement

Suppose M(initial) is the final number of nodes infected with malware in the entire network after the spread of malware stops. We will remove exactly one node from the initial.

Return the node that, if removed, would minimize M(initial). If there is more than one possible node that can be removed to achieve this goal, return the node with the smallest index.

Note: Even if a node is removed from the initial list, it might still become infected later due to the spread of malware from other connected nodes.

Example

Example 1:

Input: graph:

[[1, 0, 0],  
 [0, 1, 1],  
 [0, 1, 1]]

initial: [0, 1]

Expected Output: 1

Justification: In this example, 1 is connected with 2. If we remove 0, there will be 2 infected nodes which are 1 and 2. If we remove 1, there will be only 1 infected node, which is 0.

Example 2:

Input: graph:

[[1, 1, 0, 0],  
 [1, 1, 0, 0],  
 [0, 0, 1, 1],  
 [0, 0, 1, 1]]

initial: [1, 2]

Expected Output: 1

Justification: Here, the graph contains 2 components. In the first component 0 and 1 are connected and in the 2nd component 2 and 3 are connected. So, if we remove either 1 or 2, the number of infected nodes will be reduced from 4 to 2. However, since node 1 has a smaller index, we choose node 1.

Example 3:

Input: graph:

[[1, 0, 0, 1],  
 [0, 1, 1, 0],  
 [0, 1, 1, 1],  
 [1, 0, 1, 1]]

initial: [0, 2]

Expected Output: 0
Justification: Initially, all 4 nodes are infected with Malware. If we remove either 0 or 2, still, all nodes remain infected. For example, if we remove 0, then node 2 will be infected as it is connected to 3 and 3 is connected to the 0. Also, the same goes for the 0. So, we remove the element with smallest index, which is 0.

Solution

To solve the problem, we utilize Tarjan's algorithm to identify all the strongly connected components (SCCs) in the network graph. Tarjan's algorithm is an efficient way to find SCCs using Depth-First Search (DFS). Once we find all the SCCs, we map each node to its respective component and calculate the size of each SCC. For each SCC, we also count how many initially infected nodes it contains. The key idea is that removing an initially infected node from an SCC that has only one such node will minimize the total number of infected nodes, as it prevents further spread within that SCC.

After finding all SCCs and counting the infected nodes in each, we iterate through the initially infected nodes to determine which node's removal minimizes the spread the most. We prioritize nodes belonging to SCCs that contain exactly one infected node. Among these, we select the node whose removal results in the largest number of nodes saved from infection. If there is a tie, we choose the node with the smallest index to satisfy the problem's constraints.

Step-by-Step Algorithm

Initialize Variables:
- Define an integer n as the size of the graph, which represents the total number of nodes.
- Create arrays discovery and low of size n to store the discovery times and the lowest reachable discovery time for each node, respectively. Initialize them to -1 to indicate that no nodes have been visited yet.
- Create a parent array of size n to keep track of the parent nodes during DFS traversal, initializing all values to -1.
- Create a boolean array inStack of size n to check if a node is currently in the stack. Initialize all values to false.
- Create a Stack<Integer> named stack to store the nodes in the current DFS path.
- Create an ArrayList<List<Integer>> named scc to hold all strongly connected components (SCCs).
Find SCCs Using Tarjan's Algorithm:
- Loop through each node i from 0 to n - 1. If discovery[i] is -1, this node has not been visited. Call the tarjanSCC function to find all SCCs starting from this node i.
- Inside the tarjanSCC function:
  - Set the discovery[u] and low[u] values to the current time (time++) for the node u being visited. This marks the node's discovery time.
  - Push node u onto the stack and mark inStack[u] as true.
  - Loop through all nodes v from 0 to n - 1. If there is a direct edge from u to v (graph[u][v] == 1), perform the following:
    - If v has not been visited (discovery[v] == -1):
      - Set parent[v] to u and call tarjanSCC recursively for v.
      - Update the low value of u as low[u] = min(low[u], low[v]) to keep track of the lowest discovery time reachable from u.
    - If v is already in the stack (inStack[v] is true), it means a back edge is found. Update low[u] as low[u] = min(low[u], discovery[v]).
  - After visiting all the neighbors of u, check if u is the root of an SCC (low[u] == discovery[u]). If true, pop nodes from the stack until u is reached, forming an SCC, and add this SCC to the scc list.
Map Nodes to Their SCCs and Count Sizes:
- Create three arrays: infectedCount to store the number of infected nodes in each SCC, componentSize to store the size of each SCC, and componentMap to map each node to its respective SCC.
- Loop through each SCC in scc and for each node in the SCC, update componentMap to map the node to its SCC index and increment the size of that SCC in componentSize.
Count Infected Nodes in Each SCC:
- Loop through the initial array, which contains initially infected nodes. For each node, find its SCC using componentMap and increment the count in infectedCount for that SCC.
Sort the Initial Array:
- Sort the initial array in ascending order to prioritize removing nodes with smaller indices in case of a tie.
Determine the Best Node to Remove:
- Initialize variables result to the first element of the sorted initial array and maxSavedNodes to -1.
- Loop through each node in the initial array:
  - Find the SCC of the node using componentMap.
  - If the SCC contains exactly one infected node (infectedCount[comp] == 1):
    - Check if the size of this SCC (componentSize[comp]) is greater than maxSavedNodes. If true, update maxSavedNodes and set result to the current node.
Return the Result:
- Return result as the node whose removal minimizes the total number of infected nodes.

Algorithm Walkthrough

Input:

graph:

[[1, 0, 0],  
 [0, 1, 1],  
 [0, 1, 1]]

initial: [0, 1]

Step-by-Step Walkthrough:

Initialize Variables:
- n = 3 (number of nodes in the graph).
- discovery = [-1, -1, -1] (initially all nodes are undiscovered).
- low = [-1, -1, -1] (initially all nodes have no lowest reachable discovery time).
- parent = [-1, -1, -1] (initially all nodes have no parent).
- inStack = [false, false, false] (no nodes are in the stack initially).
- stack = [] (empty stack for DFS traversal).
- scc = [] (empty list to store SCCs).
Finding SCCs using Tarjan's Algorithm:
- Start with Node 0:
  - discovery[0] = 0, low[0] = 0 (set discovery and low time to 0).
  - Push 0 onto the stack: stack = [0], inStack[0] = true.
  - Loop through all nodes v:
    - For v = 0: graph[0][0] = 1 but u == v, skip.
    - For v = 1 and v = 2: graph[0][1] = 0 and graph[0][2] = 0, no direct connections, continue.
  - Since low[0] == discovery[0], node 0 is the root of an SCC:
    - Pop 0 from the stack: stack = [], inStack[0] = false.
    - Create SCC [0] and add to scc: scc = [[0]].
- Continue with Node 1:
  - discovery[1] = 1, low[1] = 1 (set discovery and low time to 1).
  - Push 1 onto the stack: stack = [1], inStack[1] = true.
  - Loop through all nodes v:
    - For v = 0: graph[1][0] = 0, no direct connection.
    - For v = 1: graph[1][1] = 1 but u == v, skip.
    - For v = 2: graph[1][2] = 1, direct connection exists:
      - Since discovery[2] = -1, node 2 is unvisited. Set parent[2] = 1 and call tarjanSCC for node 2.
  - DFS on Node 2:
    - discovery[2] = 2, low[2] = 2 (set discovery and low time to 2).
    - Push 2 onto the stack: stack = [1, 2], inStack[2] = true.
    - Loop through all nodes v:
      - For v = 0 and v = 1: graph[2][0] = 0, graph[2][1] = 1 but v = 1 is already visited and in stack. Update low[2] = min(low[2], discovery[1]) = min(2, 1) = 1.
      - For v = 2: graph[2][2] = 1 but u == v, skip.
    - Since low[2] == discovery[2], node 2 is part of an SCC:
      - Pop 2 from the stack: stack = [1], inStack[2] = false.
      - Create SCC [2] and add to scc: scc = [[0], [2]].
  - Back to Node 1:
    - After DFS on 2, update low[1] = min(low[1], low[2]) = min(1, 1) = 1.
    - Since low[1] == discovery[1], node 1 is the root of an SCC:
      - Pop 1 from the stack: stack = [], inStack[1] = false.
      - Create SCC [1, 2] and add to scc: scc = [[0], [1, 2]].
Map Nodes to Their SCCs and Count Sizes:
- componentMap = [0, 1, 1] (node 0 in SCC 0, nodes 1 and 2 in SCC 1).
- componentSize = [1, 2] (SCC 0 has 1 node, SCC 1 has 2 nodes).
Count Infected Nodes in Each SCC:
- Loop through initial = [0, 1]:
  - For node = 0, comp = componentMap[0] = 0. Increment infectedCount[0] = 1.
  - For node = 1, comp = componentMap[1] = 1. Increment infectedCount[1] = 1.
Sort the Initial Array:
- initial = [0, 1] (already sorted).
Determine the Best Node to Remove:
- Initialize result = 0 and maxSavedNodes = -1.
- Loop through initial = [0, 1]:
  - For node = 0, comp = componentMap[0] = 0:
    - infectedCount[0] == 1 (only one malware node in SCC 0).
    - componentSize[0] = 1 > maxSavedNodes = -1. Update maxSavedNodes = 1 and result = 0.
  - For node = 1, comp = componentMap[1] = 1:
    - infectedCount[1] == 1 (only one malware node in SCC 1).
    - componentSize[1] = 2 > maxSavedNodes = 1. Update maxSavedNodes = 2 and result = 1.
Return the Result:
- The best node to remove is 1, as it minimizes the total infected nodes. Return 1.

Code

java

import java.util.*;

class Solution {

  private int time = 0;

  public int minimizeMalwareSpread(int[][] graph, int[] initial) {
    int n = graph.length;
    int[] discovery = new int[n]; // Time of discovery of nodes
    int[] low = new int[n]; // Lowest point that can be reached
    int[] parent = new int[n]; // Parent of nodes in DFS
    boolean[] inStack = new boolean[n]; // To check if a node is in the stack
    Stack<Integer> stack = new Stack<>();
    Arrays.fill(discovery, -1);
    Arrays.fill(low, -1);
    Arrays.fill(parent, -1);

    List<List<Integer>> scc = new ArrayList<>(); // List to hold all SCCs

    // Tarjan's algorithm to find SCCs
    for (int i = 0; i < n; i++) {
      if (discovery[i] == -1) {
        tarjanSCC(i, graph, discovery, low, parent, inStack, stack, scc);
      }
    }

    int[] infectedCount = new int[scc.size()];
    int[] componentSize = new int[scc.size()];
    int[] componentMap = new int[n];

    // Map nodes to their SCC and count the size of each SCC
    for (int i = 0; i < scc.size(); i++) {
      for (int node : scc.get(i)) {
        componentMap[node] = i;
        componentSize[i]++;
      }
    }

    // Count infected nodes in each SCC
    for (int node : initial) {
      int comp = componentMap[node];
      infectedCount[comp]++;
    }

    Arrays.sort(initial);
    int result = initial[0];
    int maxSavedNodes = -1;

    // Determine the best node to remove
    for (int node : initial) {
      int comp = componentMap[node];
      if (infectedCount[comp] == 1) {
        // Only one malware node in this SCC
        if (componentSize[comp] > maxSavedNodes) {
          maxSavedNodes = componentSize[comp];
          result = node;
        }
      }
    }

    return result;
  }

  private void tarjanSCC(
    int u,
    int[][] graph,
    int[] discovery,
    int[] low,
    int[] parent,
    boolean[] inStack,
    Stack<Integer> stack,
    List<List<Integer>> scc
  ) {
    discovery[u] = low[u] = time++;
    stack.push(u);
    inStack[u] = true;

    for (int v = 0; v < graph.length; v++) {
      if (graph[u][v] == 1 && u != v) {
        // There's an edge from u to v
        if (discovery[v] == -1) {
          // If v is not visited
          parent[v] = u;
          tarjanSCC(v, graph, discovery, low, parent, inStack, stack, scc);

          low[u] = Math.min(low[u], low[v]); // Update the low-link value of u
        } else if (inStack[v]) {
          // Back edge found
          low[u] = Math.min(low[u], discovery[v]); // Update the low-link value of u
        }
      }
    }

    // If u is a root node, pop the stack and form an SCC
    if (low[u] == discovery[u]) {
      List<Integer> currentSCC = new ArrayList<>();
      while (true) {
        int v = stack.pop();
        inStack[v] = false;
        currentSCC.add(v);
        if (v == u) break;
      }
      scc.add(currentSCC);
    }
  }

  public static void main(String[] args) {
    Solution sol = new Solution();

    int[][] graph1 = { { 1, 0, 0 }, { 0, 1, 1 }, { 0, 1, 1 } };
    int[] initial1 = { 0, 1 };
    System.out.println(sol.minimizeMalwareSpread(graph1, initial1)); // Output: 1

    int[][] graph2 = {
      { 1, 1, 0, 0 },
      { 1, 1, 0, 0 },
      { 0, 0, 1, 1 },
      { 0, 0, 1, 1 },
    };
    int[] initial2 = { 1, 2 };
    System.out.println(sol.minimizeMalwareSpread(graph2, initial2)); // Output: 1

    int[][] graph3 = {
      { 1, 0, 0, 1 },
      { 0, 1, 1, 0 },
      { 0, 1, 1, 1 },
      { 1, 0, 1, 1 },
    };
    int[] initial3 = { 0, 2 };
    System.out.println(sol.minimizeMalwareSpread(graph3, initial3)); // Output: 0
  }
}

Complexity Analysis:

Time Complexity

Finding Strongly Connected Components (SCCs):
The Tarjan's algorithm used in the solution runs a Depth-First Search (DFS) on the graph. Since each node and each edge are visited exactly once, the time complexity for this part is , where V is the number of nodes (vertices) and E is the number of edges. In the worst case, we can have N vertices and N² edges.
Mapping Nodes to SCCs and Counting Sizes/Infected Nodes:
After identifying all SCCs, the solution iterates over the nodes to map them to their respective SCCs and compute the sizes of the SCCs. This takes time.
Sorting the Initial Infected Nodes:
Sorting the initial array takes time, where N is the number of initially infected nodes if we assume initially all vertices are infected.
Determining the Best Node to Remove:
This involves iterating over the initial array, which takes time.

Thus, the overall time complexity of the solution is , which is almost equal to .

Space Complexity

Storage for Discovery, Low, Parent, In-Stack Arrays:
Each of these arrays takes space, where N is the number of nodes.
Stack for DFS and SCC List:
The stack and list to store SCCs both require space in the worst case.
Component Mapping Arrays:
Arrays such as infectedCount, componentSize, and componentMap also require space.

Thus, the overall space complexity is .

🤖 Don't fully get this? Learn it with Claude

Stuck on Minimize Malware Spread? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.

🪜 Hint ladder (no spoilers)

Progressively stronger hints — you still solve it.

I'm working on the problem **Minimize Malware Spread** (DSA). Give me a HINT LADDER: start with the tiniest nudge, then wait. Only reveal the next, stronger hint when I ask. Do NOT show the full solution unless I type 'show solution'. Keep me doing the thinking. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🎨 Explain the approach visually

See the technique, not just code.

Explain the optimal approach to **Minimize Malware Spread** with a VISUAL walkthrough: trace it on a small concrete example using ASCII art / a step-by-step diagram, narrate what changes each step, then give time & space complexity with a one-line derivation. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🔍 Review my solution

Catch bugs, edge cases, sub-optimality.

I'll paste my solution to **Minimize Malware Spread**. Review it for correctness, missed edge cases, and time/space complexity, then coach me toward the optimal — don't just rewrite it. Ask me to paste my code now. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🔁 Drill the pattern

Lock in recognition with look-alikes.

Give me 2 problems that use the SAME underlying pattern as **Minimize Malware Spread**. For each, let me attempt first, then review my answer and name the trigger signal that reveals the pattern. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

📝 My notes

← Minimum Number of Days to Disconne Minimize Malware Spread II →