hard Minimize Malware Spread II

Problem Statement

You are given a network of n nodes, represented as an n x n adjacency matrix called graph. In this graph, graph[i][j] = 1 indicates that node i is directly connected to node j. If graph[i][j] = 0, there is no direct connection between them.

Some nodes in this network are initially infected by malware, and these nodes are listed in the array initial. When a node is infected, the malware spreads to all nodes that are directly connected to it. This spread continues until no more nodes can be infected.

Suppose M(initial) is the final number of nodes infected with malware in the entire network after the spread of malware stops.

We will remove exactly one node from initial, completely removing it and any connections from this node to any other node.

Return the node that, if removed, would minimize M(initial). If there is more than one possible node that can be removed to achieve this goal, return the node with the smallest index.

Examples

Example 1:

Input: graph:

[[1, 0, 1],  
 [0, 1, 1],  
 [1, 1, 1]]

initial: [0, 2]

Expected Output: 2
Justification: By removing node 2 from the initial list, we stop the malware from spreading from node 2 to node 1. Thus, only node 0 remains infected, minimizing the total infected nodes to 1.

Example 2:

Input: graph:

[[1, 1, 0, 0],  
 [1, 1, 0, 0],  
 [0, 0, 1, 1],  
 [0, 0, 1, 1]]

initial: [1, 2]

Expected Output: 1

Justification: Removing node 1 from the initial list will prevent the malware from spreading in the first cluster. The total number of infected nodes will be minimized to 2 since only nodes 2 and 3 will remain infected.

Example 3:

Input: graph:

[[1, 0, 0, 1],  
 [0, 1, 1, 0],  
 [0, 1, 1, 1],  
 [1, 0, 1, 1]]

initial: [0, 2]

Expected Output: 2

Justification: If node 2 is removed, the malware spread will be restricted only to node 0 and node 3. This minimizes the spread of malware, reducing the total infected nodes to 2.

Try it yourself

Try solving this question here:

✅ Solution Minimize Malware Spread II

Problem Statement

Suppose M(initial) is the final number of nodes infected with malware in the entire network after the spread of malware stops.

We will remove exactly one node from initial, completely removing it and any connections from this node to any other node.

Return the node that, if removed, would minimize M(initial). If there is more than one possible node that can be removed to achieve this goal, return the node with the smallest index.

Examples

Example 1:

Input: graph:

[[1, 0, 1],  
 [0, 1, 1],  
 [1, 1, 1]]

initial: [0, 2]

Expected Output: 2

Justification: By removing node 2 from the initial list, we stop the malware from spreading from node 2 to node 1. Thus, only node 0 remains infected, minimizing the total infected nodes to 1.

Example 2:

Input: graph:

[[1, 1, 0, 0],  
 [1, 1, 0, 0],  
 [0, 0, 1, 1],  
 [0, 0, 1, 1]]

initial: [1, 2]

Expected Output: 1

Justification: Removing node 1 from the initial list will prevent the malware from spreading in the first cluster. The total number of infected nodes will be minimized to 2 since only nodes 2 and 3 will remain infected.

Example 3:

Input: graph:

[[1, 0, 0, 1],  
 [0, 1, 1, 0],  
 [0, 1, 1, 1],  
 [1, 0, 1, 1]]

initial: [0, 2]

Expected Output: 2

Justification: If node 2 is removed, the malware spread will be restricted only to node 0 and node 3. This minimizes the spread of malware, reducing the total infected nodes to 2.

Solution

To solve the problem, the approach involves using Tarjan's algorithm, which is typically employed to find strongly connected components in a graph. The algorithm performs a Depth-First Search (DFS) from each initially infected node to determine the impact of removing that node from the network. The DFS is used to compute the discovery times and the lowest reachable times for each node. These times help identify the connected components and how the malware spreads throughout the network.

The primary objective is to find the node whose removal results in the smallest number of infections. We use a recursive DFS function to traverse the graph from each infected node. During this traversal, we keep track of nodes visited, compute the size of each component affected by malware, and identify nodes that, when removed, minimize the spread of malware. The node with the maximum impact on reducing infections is chosen, and in case of ties, the node with the smallest index is selected.

Step-by-Step Algorithm

Initialization:
- Determine the number of nodes in the graph (numNodes) by getting the length of the adjacency matrix.
- Set nodeToRemove to the first element of the initial list, assuming it as the default node to be removed.
- Initialize maxSavedNodes to 0 to keep track of the maximum number of nodes that can be saved by removing a single node.
Mark Initially Infected Nodes:
- Create a boolean array isInfected of size numNodes to mark which nodes are initially infected.
- Loop through each node in the initial array and set isInfected[node] to true to indicate that these nodes are infected.
Prepare for DFS Traversal:
- Initialize three arrays discoveryTime, lowestTime, and malwareSpreadCount, each of size numNodes, filled with 0.
- discoveryTime: Tracks the discovery time of each node during the DFS traversal.
- lowestTime: Tracks the lowest discovery time reachable from each node.
- malwareSpreadCount: Tracks the number of nodes saved by removing each initially infected node.
DFS Traversal for Each Infected Node:
- Iterate through each node in the initial array:
  - If discoveryTime[node] is 0, perform a DFS from that node:
    - Set discoveryTime[currentNode] and lowestTime[currentNode] to currentTime.
    - Initialize isMalwareSpread to true if the current node (currentNode) is infected, otherwise false.
    - Initialize componentSize to 1 as the node itself is a part of its connected component.
    - Traverse all connected nodes of currentNode:
      - If there is a connection (graph[currentNode][neighbor] == 1):
        
        Check if the neighbor node has been visited (discoveryTime[neighbor] == 0):
        
        If not visited, recursively call DFS for neighbor with updated currentTime (currentTime + 1).
        
        After returning from DFS call:
        
        If subTreeSize is 0 (indicating that a subtree contains a malware node), set isMalwareSpread to true.
        
        If subTreeSize is greater than 0, add it to componentSize.
        
        If lowestTime[neighbor] is greater than or equal to discoveryTime[currentNode], update malwareSpreadCount[currentNode] by adding subTreeSize.
        
        Update lowestTime[currentNode] to the minimum of lowestTime[currentNode] and lowestTime[neighbor].
        
        If neighbor is not the parent of the currentNode, update lowestTime[currentNode] to the minimum of lowestTime[currentNode] and discoveryTime[neighbor].
Compute Results:
- After DFS traversal for all initially infected nodes, compute the number of nodes saved by removing each initially infected node.
- Iterate through the malwareSpreadCount array:
  - If the number of saved nodes (malwareSpreadCount[node]) is greater than maxSavedNodes or if it equals maxSavedNodes but the node index is smaller, update maxSavedNodes and nodeToRemove accordingly.
Return the Result:
- Return nodeToRemove as the node whose removal results in minimizing the spread of malware.

Algorithm Walkthrough

Given the input:

graph:

[[1, 0, 1],  
 [0, 1, 1],  
 [1, 1, 1]]

initial: [0, 2]

Initialization:
- numNodes is 3 (since there are 3 nodes in the graph).
- nodeToRemove is initialized to 0 (first node in the initial list).
- maxSavedNodes is initialized to 0.
- isInfected is [false, false, false] initially.
Mark Initially Infected Nodes:
- isInfected[0] is set to true.
- isInfected[2] is set to true.
- Now, isInfected becomes [true, false, true].
Prepare for DFS Traversal:
- discoveryTime is [0, 0, 0].
- lowestTime is [0, 0, 0].
- malwareSpreadCount is [0, 0, 0].
DFS Traversal for Each Infected Node:
- Start with node 0 (first infected node):
  - discoveryTime[0] is 0, so perform DFS on node 0:
    - Set discoveryTime[0] and lowestTime[0] to 1.
    - Initialize isMalwareSpread to true (node 0 is infected).
    - Initialize componentSize to 1.
    - Traverse neighbors of node 0:
      - Neighbor 0: Already visited (it's the current node), skip.
      - Neighbor 1: No direct connection (graph[0][1] is 0), skip.
      - Neighbor 2: Connected (graph[0][2] is 1) and discoveryTime[2] is 0:
        
        Perform DFS on node 2:
        
        Set discoveryTime[2] and lowestTime[2] to 2.
        
        Initialize isMalwareSpread to true (node 2 is infected).
        
        Initialize componentSize to 1.
        
        Traverse neighbors of node 2:
        
        Neighbor 0: Connected (graph[2][0] is 1), but it's the parent, so update lowestTime[2] to min(2, 1) = 1.
        
        Neighbor 1: Connected (graph[2][1] is 1), discoveryTime[1] is 0:
        
        Perform DFS on node 1:
        
        Set discoveryTime[1] and lowestTime[1] to 3.
        
        Initialize isMalwareSpread to false (node 1 is not infected).
        
        Initialize componentSize to 1.
        
        Traverse neighbors of node 1:
        
        Neighbor 0: No direct connection (graph[1][0] is 0), skip.
        
        Neighbor 1: Already visited (it's the current node), skip.
        
        Neighbor 2: Connected (graph[1][2] is 1), but already visited, update lowestTime[1] to min(3, 2) = 2.
        
        Return 1 (size of the component not affected by malware).
        
        Node 2: subTreeSize is 1 (node 1 is not infected), add to componentSize making it 2.
        
        Update malwareSpreadCount[2] to 1 (contributed by subTreeSize).
        
        Neighbor 2: Already visited (it's the current node), skip.
        
        Return 0 (since node 2 is infected).
    - Node 0: subTreeSize is 0 (node 2 is infected), set isMalwareSpread to true.
    - Node 0: Update lowestTime[0] to min(1, 1) = 1.
- Compute Result:
  - After DFS, malwareSpreadCount is [0, 0, 1].
  - Check node 0: malwareSpreadCount[0] is 0, maxSavedNodes remains 0.
  - Check node 2: malwareSpreadCount[2] is 1, update maxSavedNodes to 1, set nodeToRemove to 2.
Return Result:
- The optimal node to remove is 2, as it results in minimizing the spread of malware.

Code

java

import java.util.*;

public class Solution {

  // Helper method to perform DFS using Tarjan's algorithm
  private int dfs(
    int[][] graph,
    int currentNode,
    int parent,
    int currentTime,
    int[] discoveryTime,
    int[] lowestTime,
    boolean[] isInfected,
    int[] malwareSpreadCount
  ) {
    // Initialize discovery and lowest reachable time for the current node
    lowestTime[currentNode] = discoveryTime[currentNode] = currentTime;
    boolean isMalwareSpread = isInfected[currentNode];
    int componentSize = 1; // Initialize component size

    // Traverse all connected nodes
    for (int neighbor = 0; neighbor < graph[currentNode].length; neighbor++) {
      if (graph[currentNode][neighbor] == 1) {
        // Check if there is a connection
        if (discoveryTime[neighbor] == 0) {
          // If the neighbor has not been visited
          int subTreeSize = dfs(
            graph,
            neighbor,
            currentNode,
            currentTime + 1,
            discoveryTime,
            lowestTime,
            isInfected,
            malwareSpreadCount
          );
          if (subTreeSize == 0) {
            // If subtree contains a malware node
            isMalwareSpread = true;
          } else {
            componentSize += subTreeSize; // Add subtree size to the current component size
          }
          // Update malware spread count if the lowest time of the neighbor is greater than or equal to the discovery time
          if (lowestTime[neighbor] >= discoveryTime[currentNode]) {
            malwareSpreadCount[currentNode] += subTreeSize;
          }
          lowestTime[currentNode] = Math.min(
            lowestTime[currentNode],
            lowestTime[neighbor]
          ); // Update lowest reachable time
        } else if (neighbor != parent) {
          // If the neighbor is not the parent, update the lowest reachable time
          lowestTime[currentNode] = Math.min(
            lowestTime[currentNode],
            discoveryTime[neighbor]
          );
        }
      }
    }
    return isMalwareSpread ? 0 : componentSize; // Return the size of the component if malware doesn't spread
  }

  public int minMalwareSpread(int[][] graph, int[] initial) {
    int numNodes = graph.length; // Number of nodes in the graph
    int nodeToRemove = initial[0]; // Start with the first node in the initial array
    int maxSavedNodes = 0; // Maximum number of saved nodes

    boolean[] isInfected = new boolean[numNodes];
    for (int node : initial) isInfected[node] = true; // Mark initially infected nodes

    int[] discoveryTime = new int[numNodes];
    int[] lowestTime = new int[numNodes];
    int[] malwareSpreadCount = new int[numNodes];

    // Perform DFS for each initially infected node
    for (int node : initial) {
      if (discoveryTime[node] == 0) {
        dfs(
          graph,
          node,
          -1,
          1,
          discoveryTime,
          lowestTime,
          isInfected,
          malwareSpreadCount
        );
      }
      // Choose the node that maximizes the number of saved nodes
      if (
        malwareSpreadCount[node] > maxSavedNodes ||
        (malwareSpreadCount[node] == maxSavedNodes && node < nodeToRemove)
      ) {
        maxSavedNodes = malwareSpreadCount[node];
        nodeToRemove = node;
      }
    }
    return nodeToRemove;
  }

  // Main method for testing
  public static void main(String[] args) {
    Solution solution = new Solution();

    int[][] graph1 = { { 1, 0, 1 }, { 0, 1, 1 }, { 1, 1, 1 } };
    int[] initial1 = { 0, 2 };
    System.out.println(
      "Result for Example 1: " + solution.minMalwareSpread(graph1, initial1)
    );

    int[][] graph2 = {
      { 1, 1, 0, 0 },
      { 1, 1, 0, 0 },
      { 0, 0, 1, 1 },
      { 0, 0, 1, 1 },
    };
    int[] initial2 = { 1, 2 };
    System.out.println(
      "Result for Example 2: " + solution.minMalwareSpread(graph2, initial2)
    );

    int[][] graph3 = {
      { 1, 0, 0, 1 },
      { 0, 1, 1, 0 },
      { 0, 1, 1, 1 },
      { 1, 0, 1, 1 },
    };
    int[] initial3 = { 0, 2 };
    System.out.println(
      "Result for Example 3: " + solution.minMalwareSpread(graph3, initial3)
    );
  }
}

Complexity Analysis

Time Complexity

The time complexity of the above solution is , where V is the number of nodes (vertices) in the graph and E is the number of edges.

The DFS (Depth First Search) traversal using Tarjan's algorithm is done for each node, and each edge is explored at most once. Thus, the DFS takes time.

Overall, the time complexity is , as total number of edges are N*(N-1) if all vertices are connected to each other.

Space Complexity

The space complexity is due to:

Arrays like discoveryTime, lowestTime, malwareSpreadCount, and isInfected, each of size .
The recursion stack used in the DFS, which can go up to in the worst case.

Therefore, the total space complexity is .

🤖 Don't fully get this? Learn it with Claude

Stuck on Minimize Malware Spread II? Open Claude, copy a block below, and it'll teach you this exact concept — visually and interactively.

🪜 Hint ladder (no spoilers)

Progressively stronger hints — you still solve it.

I'm working on the problem **Minimize Malware Spread II** (DSA). Give me a HINT LADDER: start with the tiniest nudge, then wait. Only reveal the next, stronger hint when I ask. Do NOT show the full solution unless I type 'show solution'. Keep me doing the thinking. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🎨 Explain the approach visually

See the technique, not just code.

Explain the optimal approach to **Minimize Malware Spread II** with a VISUAL walkthrough: trace it on a small concrete example using ASCII art / a step-by-step diagram, narrate what changes each step, then give time & space complexity with a one-line derivation. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🔍 Review my solution

Catch bugs, edge cases, sub-optimality.

I'll paste my solution to **Minimize Malware Spread II**. Review it for correctness, missed edge cases, and time/space complexity, then coach me toward the optimal — don't just rewrite it. Ask me to paste my code now. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

🔁 Drill the pattern

Lock in recognition with look-alikes.

Give me 2 problems that use the SAME underlying pattern as **Minimize Malware Spread II**. For each, let me attempt first, then review my answer and name the trigger signal that reveals the pattern. If you're unsure or a claim isn't standard, say so and reason from first principles instead of guessing.

📝 My notes

← Minimize Malware Spread Critical Connections in a Network →