寻找图像中RGB颜色的边界框

3 投票

3 回答

3334 浏览

数据工程师

提问于 2025-04-17 13:45

我正在处理一个页面分割算法。这个算法的输出会生成一张图像，图像中每个区域的像素都被分配了一个独特的颜色。我想对这张图像进行处理，以找到这些区域的边界框。我需要先找出所有的颜色，然后找到每种颜色的所有像素，最后再找到它们的边界框。

下面是一个示例图像。

示例输出图像，显示了彩色区域

我现在开始使用红、绿、蓝（R、G、B）通道的直方图。直方图可以告诉我数据的位置。

img = Image.open(imgfilename)
img.load()
r,g,b = img.split()

ra,ga,ba = [ np.asarray(p,dtype="uint8") for p in (r,g,b) ]

rhist,edges = np.histogram(ra,bins=256)
ghist,edges = np.histogram(ga,bins=256)
bhist,edges = np.histogram(ba,bins=256)
print np.nonzero(rhist)
print np.nonzero(ghist)
print np.nonzero(bhist)

输出： (array([ 0, 1, 128, 205, 255]),) (array([ 0, 20, 128, 186, 255]),) (array([ 0, 128, 147, 150, 255]),)

到目前为止，我有点困惑。通过目测，我找到了颜色（0,0,0）、（1,0,0）、（0,20,0）、（128,128,128）等等。我应该如何将非零的输出转换成 np.where() 所需的像素值呢？

我在考虑将这个三维数组（3，行，列）压扁成一个二维平面，里面是24位打包的RGB值（r<<24|g<<16|b），然后在这个数组中搜索。这种方法感觉有点粗暴且不优雅。有没有更好的方法可以在Numpy中找到某个颜色值的边界框呢？

numpy 直方图图像处理边界框三维数组 RGB颜色像素值页面分割

3 个回答

这只是我随便想到的一个解决办法。你可以从图像的左上角开始，一直遍历到右下角，记录每种颜色的 top、bottom、left 和 right 值。对于某种颜色，top 值就是你看到的第一行有这种颜色的像素，bottom 值则是最后一行，left 值是这一颜色像素所在的最小列数，而 right 值是最大列数。

接着，对于每种颜色，你可以从 top-left 到 bottom-right 画一个矩形，填上你想要的颜色。

我不知道这算不算一个好的边界框算法，但我觉得还可以。

回答于 2025-04-17 由 Python大师

分享举报

编辑把所有内容整合成一个可运行的程序，使用你发布的图片：

from __future__ import division
import numpy as np
import itertools
from PIL import Image

img = np.array(Image.open('test_img.png'))

def bounding_boxes(img) :
    r, g, b = [np.unique(img[..., j]) for j in (0, 1, 2)]
    bounding_boxes = {}
    for r0, g0, b0 in itertools.product(r, g, b) :
        rows, cols = np.where((img[..., 0] == r0) &
                              (img[..., 1] == g0) &
                              (img[..., 2] == b0))
        if len(rows) :
            bounding_boxes[(r0, g0, b0)] = (np.min(rows), np.max(rows),
                                            np.min(cols), np.max(cols))
    return bounding_boxes

In [2]: %timeit bounding_boxes(img)
1 loops, best of 3: 30.3 s per loop

In [3]: bounding_boxes(img)
Out[3]: 
{(0, 0, 255): (3011, 3176, 755, 2546),
 (0, 128, 0): (10, 2612, 0, 561),
 (0, 128, 128): (1929, 1972, 985, 1438),
 (0, 255, 0): (10, 166, 562, 868),
 (0, 255, 255): (2938, 2938, 680, 682),
 (1, 0, 0): (10, 357, 987, 2591),
 (128, 0, 128): (417, 1873, 984, 2496),
 (205, 186, 150): (11, 56, 869, 1752),
 (255, 0, 0): (3214, 3223, 570, 583),
 (255, 20, 147): (2020, 2615, 956, 2371),
 (255, 255, 0): (3007, 3013, 600, 752),
 (255, 255, 255): (0, 3299, 0, 2591)}

即使检查的颜色数量很少，这个程序的速度也不是很快……

你可以用类似下面的方式找到颜色 r0、g0、b0 的边界框：

rows, cols = np.where((ra == r0) & (ga == g0) & (ba == b0))
top, bottom = np.min(rows), np.max(rows)
left, right = np.min(cols), np.max(cols)

与其遍历所有 2**24 种 RGB 颜色组合，不如只用你非零的直方图数据来大大减少搜索范围：

for r0, g0, b0 in itertools.product(np.nonzero(rhist),
                                    np.nonzero(ghist),
                                    np.nonzero(bhist)) :

这样会有一些不存在的组合混进来，但你可以通过检查 rows 和 cols 不是空元组来过滤掉它们。不过在你的例子中，你把搜索范围从 2**24 种组合减少到了仅仅 125 种。

回答于 2025-04-17 由 Python大师

分享举报

其实没必要把这个当成RGB彩色图像来看，它只是别人做的一个分割的可视化效果。你可以简单地把它当成灰度图像，对于这些特定的颜色，你自己不需要做其他处理。

import sys
import numpy
from PIL import Image

img = Image.open(sys.argv[1]).convert('L')

im = numpy.array(img) 
colors = set(numpy.unique(im))
colors.remove(255)

for color in colors:
    py, px = numpy.where(im == color)
    print(px.min(), py.min(), px.max(), py.max())

如果你不能依赖 convert('L') 来给出独特的颜色（也就是说，你使用的颜色超出了给定图像中的那些颜色），你可以把你的图像打包，然后获取独特的颜色：

...
im = numpy.array(img, dtype=int)

packed = im[:,:,0]<<16 | im[:,:,1]<<8 | im[:,:,2]
colors = set(numpy.unique(packed.ravel()))
colors.remove(255<<16 | 255<<8 | 255)

for color in colors:
    py, px = numpy.where(packed == color)
    print(px.min(), py.min(), px.max(), py.max())

顺便提一下，我还建议在找到边界框之前，先去掉一些小的连通区域。

回答于 2025-04-17 由 Python大师

分享举报

寻找图像中RGB颜色的边界框

3 个回答

撰写回答