ROI(感兴趣区域)在图像同一边界框内显示两条线
我正在尝试在一张印地语的图片上准确地检测每一行,并为每一行画上框。但是问题是,有两行大字的文字被识别到了同一个框里。你可以在下面的图片中看到这个问题 -
每一行都必须被准确地识别为独立的一行。这里是源代码 -
import cv2
from google.colab.patches import cv2_imshow
import numpy as np
if __name__ == "__main__":
image = cv2.imread('datasets/0010_jpg.rf.e7741188a2afa6db3dee4324e8486a34.jpg')
# Display the image
# cv2_imshow(image)
# Convert image to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# cv2_imshow(gray)
# Convert grayscale image to binary
ret, thresh = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY_INV)
# cv2_imshow(thresh)
# Apply Canny edge detection
edges = cv2.Canny(thresh, 50, 150) # Adjust the threshold values as needed
# cv2_imshow(edges)
# Dilation
kernel = np.ones((5, 200), np.uint8)
img_dilation = cv2.dilate(edges, kernel, iterations=1)
# cv2_imshow(img_dilation)
# Find contours
contours, hierarchy = cv2.findContours(img_dilation.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Sort contours based on their bounding box coordinates
bounding_boxes = [cv2.boundingRect(ctr) for ctr in contours]
sorted_contours = [ctr for _, ctr in sorted(zip(bounding_boxes, contours), key=lambda pair: pair[0][1])]
# Loop over sorted contours
for i, ctr in enumerate(sorted_contours):
# Get bounding box
x, y, w, h = cv2.boundingRect(ctr)
# Getting ROI
roi = image[y:y+h-5, x:x+w]
roi_row = roi.shape[0]
roi_col = roi.shape[1]
# Show ROI
if(roi_row>3000 or roi_row<=20 or roi_row<=10 or roi_col<=110):
continue
print(i)
print(roi_row,roi_col)
cv2_imshow(roi)
cv2.rectangle(image, (x, y), (x + w, y + h), (90, 0, 255), 2)
cv2_imshow(image)
1 个回答
0
我解决了在同一个边界矩形(ROI)内准确捕捉两行文本的问题,具体步骤如下:
1. 两行文本检测的高度阈值:我发现当边界矩形的高度(h)超过60像素时,里面很可能包含两行文本。因此,我设定了一个60像素的高度阈值来利用这个观察结果。
2. 垂直拆分ROI:当检测到一个高度超过阈值的边界矩形时,我将这个ROI垂直拆分成两个独立的区域。这样做是通过将ROI分成两个部分,每部分代表一行文本。
if h > 60:
# Split ROI into two separate ROIs vertically
roi1 = roi[:h//2, :]
roi2 = roi[h//2:, :]
# Update rectangle for the first ROI
cv2.rectangle(self.image, (x, y), (x + w, y + h//2), (90, 0, 255), 2)
# Update rectangle for the second ROI
cv2.rectangle(self.image, (x, y + h//2), (x + w, y + h), (90, 0, 255), 2)
k += 2
else:
# Update rectangle for the ROI
cv2.rectangle(self.image, (x, y), (x + w, y + h), (90, 0, 255), 2)
k += 1