ROI(感兴趣区域)在图像同一边界框内显示两条线

0 投票
1 回答
39 浏览
提问于 2025-04-14 16:03

我正在尝试在一张印地语的图片上准确地检测每一行,并为每一行画上框。但是问题是,有两行大字的文字被识别到了同一个框里。你可以在下面的图片中看到这个问题 -

在这里输入图片描述

原始图片 - 在这里输入图片描述

每一行都必须被准确地识别为独立的一行。这里是源代码 -

import cv2
from google.colab.patches import cv2_imshow
import numpy as np

if __name__ == "__main__":
  image = cv2.imread('datasets/0010_jpg.rf.e7741188a2afa6db3dee4324e8486a34.jpg')

  # Display the image
  # cv2_imshow(image)

  # Convert image to grayscale
  gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
  # cv2_imshow(gray)

  # Convert grayscale image to binary
  ret, thresh = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY_INV)
  # cv2_imshow(thresh)

   # Apply Canny edge detection
  edges = cv2.Canny(thresh, 50, 150)  # Adjust the threshold values as needed
  # cv2_imshow(edges)

  # Dilation
  kernel = np.ones((5, 200), np.uint8)
  img_dilation = cv2.dilate(edges, kernel, iterations=1)
  # cv2_imshow(img_dilation)

  # Find contours
  contours, hierarchy = cv2.findContours(img_dilation.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

  # Sort contours based on their bounding box coordinates
  bounding_boxes = [cv2.boundingRect(ctr) for ctr in contours]
  sorted_contours = [ctr for _, ctr in sorted(zip(bounding_boxes, contours), key=lambda pair: pair[0][1])]

  # Loop over sorted contours
  for i, ctr in enumerate(sorted_contours):
      # Get bounding box
      x, y, w, h = cv2.boundingRect(ctr)

      # Getting ROI
      roi = image[y:y+h-5, x:x+w]
      roi_row = roi.shape[0]
      roi_col = roi.shape[1]

      # Show ROI
      if(roi_row>3000 or roi_row<=20 or roi_row<=10 or roi_col<=110):
          continue
      print(i)
      print(roi_row,roi_col)
      cv2_imshow(roi)
      cv2.rectangle(image, (x, y), (x + w, y + h), (90, 0, 255), 2)

  cv2_imshow(image)

1 个回答

0

我解决了在同一个边界矩形(ROI)内准确捕捉两行文本的问题,具体步骤如下:

1. 两行文本检测的高度阈值:我发现当边界矩形的高度(h)超过60像素时,里面很可能包含两行文本。因此,我设定了一个60像素的高度阈值来利用这个观察结果。

2. 垂直拆分ROI:当检测到一个高度超过阈值的边界矩形时,我将这个ROI垂直拆分成两个独立的区域。这样做是通过将ROI分成两个部分,每部分代表一行文本。

if h > 60:
    # Split ROI into two separate ROIs vertically
    roi1 = roi[:h//2, :]
    roi2 = roi[h//2:, :]

    # Update rectangle for the first ROI
    cv2.rectangle(self.image, (x, y), (x + w, y + h//2), (90, 0, 255), 2)

     # Update rectangle for the second ROI
    cv2.rectangle(self.image, (x, y + h//2), (x + w, y + h), (90, 0, 255), 2)

    k += 2
  else:
    # Update rectangle for the ROI
    cv2.rectangle(self.image, (x, y), (x + w, y + h), (90, 0, 255), 2)

    k += 1

撰写回答