如何将边界框合并为一个边界框

1条回答

网友

1楼 · 发布于 2024-05-15 18:30:56

这里有一个可能的解决方案。但是，请注意，我不能保证这将适用于您尚未发布的复杂图像。另外，你在互联网上收到陌生人的免费帮助，不要指望一个完整的解决方案能不费吹灰之力就解决你的问题。帮助别人是很酷的，但是请合理地设定你的期望值

该方法涉及获取图像上最大对象的边界框，这些是假设：

如果您对每幅图像处理一个草图和标题，如果一幅图像上有多个图形，这种方法将不会有帮助。您必须手动剪切它们。例如，烧烤架的图像–必须在两个图像中分开
一些图形及其标题不能用矩形分隔–这是因为用四个直角的四边形包围图形也会包围标题，如果后者位于所述四边形区域内（在此方法之前，您必须过滤标题、扩展此方法或使用多边形进行裁剪-这是另一个不同的问题）

该方法包括将图像缩小为水平和垂直投影。我们只需要两个投影的起点和终点，我们应该能够构造一个边界矩形。投影正好（理想情况下）然而，如果人物的标题与投影不重叠，我们可以通过处理最大的投影来过滤掉。这是一种很好的方法，非常适合像烧烤架这样的图像

以下是步骤：

将输入转换为灰度
调整灰度图像的大小因为图像是巨大的，我们不需要所有这些信息
应用一些形态学–一个小的关闭来连接图形的一小部分就可以了
使用^{}函数将图像缩小为水平和垂直投影
过滤器最大/最大投影
获取投影的起点和终点（实际上只是一个数字）
构造边框
向上缩放边框

我在这里手动分离了栅格图像：Part 1和Part 2。让我们看看代码：

# Imports
import cv2
import numpy as np
Read image
imagePath = "D://opencvImages//"
inputImage = cv2.imread(imagePath+"sketch03.png")

# Convert BGR to grayscale:
grayscaleImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)

# Get image dimensions
originalImageHeight, originalImageWidth = grayscaleImage.shape[:2]

# Resize at a fixed scale:
resizePercent = 30
resizedWidth = int(originalImageWidth * resizePercent / 100)
resizedHeight = int(originalImageHeight * resizePercent / 100)

# resize image
resizedImage = cv2.resize(grayscaleImage, (resizedWidth, resizedHeight))

# Threshold via Otsu:
_, binaryImage = cv2.threshold(resizedImage, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

第一位将图像大小调整为30的缩放百分比。这对于您发布的图像来说已经足够了。该过程相当简单，并生成此（缩小的）二值图像：

我们可以应用一点形态学将图形的较小部分连接成一个实体组件。让我们应用一个带有3 x 3矩形结构元素的closing（膨胀后侵蚀）：
# Perform a little bit of morphology: # Set kernel (structuring element) size: kernelSize = (3, 3) # Set operation iterations: opIterations = 1 # Get the structuring element: morphKernel = cv2.getStructuringElement(cv2.MORPH_RECT, kernelSize) # Perform Dilate: binaryImage = cv2.morphologyEx(binaryImage, cv2.MORPH_CLOSE, morphKernel, None, None, opIterations, cv2.BORDER_REFLECT101)
结果是：
好的，让我们reduce这个图像。我们首先通过减少行得到水平投影，然后通过减少列得到垂直投影模式，其中行/列中的每个像素值定义为对应于该图像行/列的最大强度值
计算投影后，我们可以立即过滤图像中的最小直线。我们可以计算contours，得到它的“边界矩形”（实际上，矩形是只是一个起点/终点，因为投影只是一条直线）并保留最大的。在此步骤中，我们还可以存储起点/终点：
# Set number of reductions (dimensions): dimensions = 2 # Store the data of both reductions here: boundingRectsList = [] # Reduce the image: for i in range(dimensions): # Reduce image, first horizontal, then vertical: reducedImg = cv2.reduce(binaryImage, i, cv2.REDUCE_MAX) # Get biggest line (biggest blob) and its start/ending coordinate, # set initial values for the largest contour: largestArea = 0 # Find the contours on the binary image: contours, hierarchy = cv2.findContours(reducedImg, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE) # Create temporal tuple to store the rectangle data: tempRect = () # Get the largest contour in the contours list: for j, c in enumerate(contours): boundRect = cv2.boundingRect(c) # Get the dimensions of the bounding rect: rectX = boundRect[0] rectY = boundRect[1] rectWidth = boundRect[2] rectHeight = boundRect[3] # Get the bounding rect area: area = rectWidth * rectHeight # Store the info of the largest contour: if area > largestArea: largestArea = area # Store the bounding rectangle data: if i == 0: # the first dimension is horizontal tempRect = (rectX, rectWidth) else: # the second dimension is vertical: tempRect = (rectY, rectHeight) # Got the biggest contour: boundingRectsList.append(tempRect)
这几乎就是这个过程的肉。这些图像显示了第一张图像的水平和垂直投影
水平的预测：
垂直投影：
注意水平投影上的第二条（较小）线。这与标题相对应，我们的“最大区域过滤器”忽略了标题。所有相关信息都存储在boundingRectsList变量中。让我们构造边界矩形，放大信息并在原始的放大输入上显示矩形：
# Compute resize factors: horizontalFactor = originalImageWidth/resizedWidth verticalFactor = originalImageHeight/resizedHeight # Create bounding box: boundingRectX = boundingRectsList[0][0] * horizontalFactor boundingRectY = boundingRectsList[1][0] * verticalFactor boundingRectWidth = boundingRectsList[0][1] * horizontalFactor boundingRectHeight = boundingRectsList[1][1] * verticalFactor # Set bounding rectangle: binaryImageColor = cv2.cvtColor(binaryImage, cv2.COLOR_GRAY2BGR) color = (0, 0, 255) cv2.rectangle(inputImage, (int(boundingRectX), int(boundingRectY)), (int(boundingRectX + boundingRectWidth), int(boundingRectY + boundingRectHeight)), color, 2) # Show image: cv2.imshow("Rectangle", inputImage) cv2.waitKey
这将产生：
烤架的第二个图像：
鞋子的第一张图片：

相关问题更多 >

编程相关推荐

热门问题

热门文章