A Deep Dive Into Non-Maximum Suppression (NMS)

Non-maximum suppression (NMS) is a post-processing technique that is commonly used in object detection to eliminate duplicate detections and select the most relevant bounding boxes that correspond to the detected objects. It’s a critical step in many computer vision applications, such as face detection, pedestrian detection and object recognition.

Non-Maximum Suppression Explained

Non-maximum suppression (NMS) is a post-processing technique used in object detection to eliminate duplicate detections and select the most relevant detected objects. This helps reduce false positives and the computational complexity of a detection algorithm.

For example, I have an image in which I want to do localization and detection. I’ll divide it into many grids and determine if each grid contains an object. Each grid will have different confidence. How will you know which is the proper object grid? NMS will help you identify the false positives that can occur in object detection and eliminate redundant detections.

What Is Non-Maximum Suppression?

Non-maximum suppression is a technique that’s used after the region proposal step to eliminate duplicate bounding boxes and select the most relevant ones. The idea behind NMS is straightforward. It works by comparing the confidence scores of the proposed bounding boxes and eliminating the ones that overlap significantly with a higher-scoring bounding box.

More on AIWhat Is Neuromorphic Computing?

What Is Object Detection?

Object detection is the process of identifying objects of interest in an image or video stream. It’s a fundamental task in computer vision that has many real-world applications. Object detection typically involves two main steps: region proposal and classification. Region proposal is the process of generating potential bounding boxes that may contain objects of interest in the image. Classification is the process of determining whether each of the proposed regions contains an object or not.

Intersection Over Union (IoU) Explained

The overlapping regions between the bounding boxes are typically calculated using the intersection over union (IoU) metric. IoU measures the ratio of the area of overlap between two bounding boxes to the area of their union. If the IoU value between two bounding boxes exceeds a predefined threshold value, the bounding box with the lower confidence score is discarded.

The Jaccard index, also known as the intersection over union (IoU) metric, is simply a technique used to calculate the percentage of overlap between the prediction bounding box and the ground truth bounding box. Instead, we discover IoU between two forecast bounding boxes in NMS.

IoU stated:

Intersection Over Union(IoU) = (Target ∩ Prediction) / (Target U Prediction)

Using bounding boxes, it can be modified to:

IOU(Box1, Box2) = Intersection_Size(Box1, Box2) / Union_Size(Box1, Box2)

The process can be broken down into the following steps:

Sort the bounding boxes based on their confidence scores.
Select the bounding box with the highest confidence score and save it as a detection.
Remove all the bounding boxes that have a significant overlap with the selected bounding box. The amount of overlap is typically determined by a predefined threshold value.
Repeat steps 2 and 3 until no more bounding boxes remain.

How to Visualize Non-Maximum Suppression as a Graph

The NMS process can be visualized as a graph, where each bounding box is represented as a node and the edges between the nodes indicate the amount of overlap between the bounding boxes. The NMS algorithm finds the connected components in the graph and selects the node with the highest confidence score from each component.

def non_max_suppression(boxes, scores, threshold):
    """
    Perform non-max suppression on a set of bounding boxes and corresponding scores.

    :param boxes: a list of bounding boxes in the format [xmin, ymin, xmax, ymax]
    :param scores: a list of corresponding scores
    :param threshold: the IoU (intersection-over-union) threshold for merging bounding boxes
    :return: a list of indices of the boxes to keep after non-max suppression
    """
    # Sort the boxes by score in descending order
    order = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)
    keep = []
    while order:
        i = order.pop(0)
        keep.append(i)
        for j in order:
            # Calculate the IoU between the two boxes
            intersection = max(0, min(boxes[i][2], boxes[j][2]) - max(boxes[i][0], boxes[j][0])) * \
                           max(0, min(boxes[i][3], boxes[j][3]) - max(boxes[i][1], boxes[j][1]))
            union = (boxes[i][2] - boxes[i][0]) * (boxes[i][3] - boxes[i][1]) + \
                    (boxes[j][2] - boxes[j][0]) * (boxes[j][3] - boxes[j][1]) - intersection
            iou = intersection / union

            # Remove boxes with IoU greater than the threshold
            if iou > threshold:
                order.remove(j)
    return keep

This implementation takes a list of bounding boxes (boxes), a list of corresponding scores (scores) and an IoU threshold (threshold). It returns a list of indices of the boxes to keep after non-max suppression.

The input boxes must be in the format [xmin, ymin, xmax, ymax]. The function sorts the boxes by the score in descending order and iterates through them in that order. For each box, it checks its IoU with all the remaining boxes and removes any boxes with IoU greater than the threshold. The function returns the indices of the remaining boxes after non-max suppression.

More on AIIntroduction to Prolog: A Programming Language for AI

A tutorial on non-maximum suppression. | Video: DeepLearningAI

Benefits of Non-Maximum Suppression

NMS significantly reduces the number of false positives in object detection results. False positives occur when a bounding box is generated for an area of the image that does not contain an object. NMS eliminates these false positives by selecting only the most relevant bounding boxes that correspond to the detected objects. Additionally, NMS helps to reduce the computational complexity of object detection algorithms by eliminating redundant detections.