ROI（感兴趣区域）在图像同一边界框内显示两条线

0 投票

1 回答

39 浏览

提问于 2025-04-14 16:03

我正在尝试在一张印地语的图片上准确地检测每一行，并为每一行画上框。但是问题是，有两行大字的文字被识别到了同一个框里。你可以在下面的图片中看到这个问题 -

原始图片 -

每一行都必须被准确地识别为独立的一行。这里是源代码 -

import cv2
from google.colab.patches import cv2_imshow
import numpy as np

if __name__ == "__main__":
  image = cv2.imread('datasets/0010_jpg.rf.e7741188a2afa6db3dee4324e8486a34.jpg')

  # Display the image
  # cv2_imshow(image)

  # Convert image to grayscale
  gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
  # cv2_imshow(gray)

  # Convert grayscale image to binary
  ret, thresh = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY_INV)
  # cv2_imshow(thresh)

   # Apply Canny edge detection
  edges = cv2.Canny(thresh, 50, 150)  # Adjust the threshold values as needed
  # cv2_imshow(edges)

  # Dilation
  kernel = np.ones((5, 200), np.uint8)
  img_dilation = cv2.dilate(edges, kernel, iterations=1)
  # cv2_imshow(img_dilation)

  # Find contours
  contours, hierarchy = cv2.findContours(img_dilation.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

  # Sort contours based on their bounding box coordinates
  bounding_boxes = [cv2.boundingRect(ctr) for ctr in contours]
  sorted_contours = [ctr for _, ctr in sorted(zip(bounding_boxes, contours), key=lambda pair: pair[0][1])]

  # Loop over sorted contours
  for i, ctr in enumerate(sorted_contours):
      # Get bounding box
      x, y, w, h = cv2.boundingRect(ctr)

      # Getting ROI
      roi = image[y:y+h-5, x:x+w]
      roi_row = roi.shape[0]
      roi_col = roi.shape[1]

      # Show ROI
      if(roi_row>3000 or roi_row<=20 or roi_row<=10 or roi_col<=110):
          continue
      print(i)
      print(roi_row,roi_col)
      cv2_imshow(roi)
      cv2.rectangle(image, (x, y), (x + w, y + h), (90, 0, 255), 2)

  cv2_imshow(image)

图像处理计算机视觉深度学习边界框行检测图像分割 roi 文字识别

1 个回答

我解决了在同一个边界矩形（ROI）内准确捕捉两行文本的问题，具体步骤如下：

1. 两行文本检测的高度阈值：我发现当边界矩形的高度（h）超过60像素时，里面很可能包含两行文本。因此，我设定了一个60像素的高度阈值来利用这个观察结果。

2. 垂直拆分ROI：当检测到一个高度超过阈值的边界矩形时，我将这个ROI垂直拆分成两个独立的区域。这样做是通过将ROI分成两个部分，每部分代表一行文本。

if h > 60:
    # Split ROI into two separate ROIs vertically
    roi1 = roi[:h//2, :]
    roi2 = roi[h//2:, :]

    # Update rectangle for the first ROI
    cv2.rectangle(self.image, (x, y), (x + w, y + h//2), (90, 0, 255), 2)

     # Update rectangle for the second ROI
    cv2.rectangle(self.image, (x, y + h//2), (x + w, y + h), (90, 0, 255), 2)

    k += 2
  else:
    # Update rectangle for the ROI
    cv2.rectangle(self.image, (x, y), (x + w, y + h), (90, 0, 255), 2)

    k += 1

回答于 2025-04-14 由 Python大师

分享举报

ROI（感兴趣区域）在图像同一边界框内显示两条线

1 个回答

撰写回答