Numpy PIL Python：在空白处裁剪图像或使用直方图阈值裁剪文本

2 投票

1 回答

4307 浏览

数据工程师

提问于 2025-04-18 12:52

我该如何找到下面这张图片中数字周围的空白区域的边界框或窗口呢？

原始图片：

在这里输入图片描述

高度：762 像素
宽度：1014 像素

目标：

我想得到类似这样的结果：{x-bound:[x-upper,x-lower], y-bound:[y-upper,y-lower]}，这样我就可以裁剪出文本并输入到 tesseract 或其他 OCR 工具中。

尝试：

我曾考虑将图片切成固定大小的块，然后随机分析，但我觉得这样会太慢。

以下是使用 pyplot 的示例代码，改编自 (使用 Python 和 PIL 如何在图片中抓取一块文本？)：

from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
im = Image.open('/home/jmunsch/Pictures/Aet62.png')
p = np.array(im)
p = p[:,:,0:3]
p = 255 - p
lx,ly,lz = p.shape

plt.plot(p.sum(axis=1))
plt.plot(p.sum(axis=0))

#I was thinking something like this 
#The image is a 3-dimensional ndarray  [[x],[y],[color?]]
#Set each value below an axes mean to 0
[item = 0 for item in p[axis=0] if item < p.mean(axis=0)]

# and then some type of enumerated groupby for each axes
#finding the mean index for each groupby(0) on axes

plt.plot(p[mean_index1:mean_index2,mean_index3:mean_index4])

根据图表，每个谷底都表示一个边界的位置。

第一个图表显示了文本行的位置
第二个图表显示了字符的位置

图表示例 `plt.plot(p.sum(axis=1))`：

在这里输入图片描述

图表示例输出 `plt.plot(p.sum(axis=0))`：

在这里输入图片描述

更新：HYRY 的解决方案

在这里输入图片描述

图像处理 PIL 边界框文本识别空白区域 ocr 裁剪直方图阈值

1 个回答

我觉得你可以在 scipy.ndimage 里使用形态学函数，这里有个例子：

import pylab as pl
import numpy as np
from scipy import ndimage
img = pl.imread("Aet62.png")[:, :, 0].astype(np.uint8)
img2 = ndimage.binary_erosion(img, iterations=40)
img3 = ndimage.binary_dilation(img2, iterations=40)
labels, n = ndimage.label(img3)
counts = np.bincount(labels.ravel())
counts[0] = 0
img4 = labels==np.argmax(counts)
img5 = ndimage.binary_fill_holes(img4)
result = ~img & img5
result = ndimage.binary_erosion(result, iterations=3)
result = ndimage.binary_dilation(result, iterations=3)
pl.imshow(result, cmap="gray")

输出结果是：

在这里输入图片描述

回答于 2025-04-18 由 Python大师

分享举报

Numpy PIL Python：在空白处裁剪图像或使用直方图阈值裁剪文本

原始图片：

目标：

尝试：

图表示例 plt.plot(p.sum(axis=1))：

图表示例输出 plt.plot(p.sum(axis=0))：

更新：HYRY 的解决方案

1 个回答

撰写回答

图表示例 `plt.plot(p.sum(axis=1))`：

图表示例输出 `plt.plot(p.sum(axis=0))`：