线段分割问题：如何用CV2检测手写字母中的线并绘制边界框

Question

我正在处理一张手写信件的扫描图像。我的目标是找到图像中每一行的边界框。但这些边界框不能重叠。

输入图像：

绘制边界框后的预期图像：

所以我的步骤是：

使用CV2读取图像

image_binary = cv2.imread(input_path, cv2.IMREAD_UNCHANGED)

使用MSER（最大稳定极值区域）来确定所有检测到的区域并绘制边界框。

mser = cv2.MSER_create()
gray = cv2.cvtColor(image_binary, cv2.COLOR_BGR2GRAY)
regions, _ = mser.detectRegions(gray)
regions = [region for region in regions if (cv2.boundingRect(region)[3] <= 100)]
hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions]

# Converting hulls to 3 Dimensional array
plines = []
for hull in hulls:
  x, y, w, h = cv2.boundingRect(hull)
  cnt_points = x, y, x+w, y+h
  added_to_line = False
  for pline in plines:
    if (abs(pline[0][1] - cnt_points[1]) <= 50): 
      pline.append(cnt_points)
      added_to_line = True
      break

  if not added_to_line:
    plines.append([cnt_points])
    plines = sorted(plines, key=lambda pline: pline[0][1])

# Here simply I am drawing rectangles on the detected points for visualization
lines = []
output = image_binary.copy() 
for pline in plines:
  min_x = min([cnt[0] for cnt in pline])
  min_y = min([cnt[1] for cnt in pline])
  max_x = max([cnt[2] for cnt in pline])
  max_y = max([cnt[3] for cnt in pline])
  lines.append((min_x, min_y, max_x, max_y))
  cv2.rectangle(output, (min_x, min_y), (max_x, max_y), (0, 255, 0), 2)

cv2.imwrite('output.jpg', output)
print(lines)

MSER效果不好，因为有些字母与上下行重叠，而且行间距不一致。

这是我实际的输出：

我明白限制检测区域的高度（在我的例子中是100）和行的偏差（在我的例子中是50），可能会产生更好的结果。但主要的问题是如何在实际情况下绘制这些边界框。

更新：根据@Christoph Rackwitz的建议，要求是确定这封手写信件中的行数。

图像处理计算机视觉图像分析边界框手写识别线段分割 MSER 行检测

线段分割问题：如何用CV2检测手写字母中的线并绘制边界框

0 个回答

撰写回答