优化Python中读取卫星图像文件的性能

4 投票

3 回答

1587 浏览

数据工程师

提问于 2025-04-17 01:11

我有一张多波段的卫星图像，存储在一种叫做带交错像素（BIP）格式的文件里，还有一个单独的头文件。这个头文件里提供了一些信息，比如图像的行数和列数，以及波段的数量（可能超过标准的3个波段）。

图像的存储方式是这样的（假设有5个波段）：

[B1][B2][B3][B4][B5][B1][B2][B3][B4][B5] ... 依此类推（基本上每个像素从左上角开始，每个像素占用5个字节，分别对应5个波段）。

我需要把这些波段分开，转换成Python 3.2中的PIL图像格式（我在Windows 7 64位系统上），但我觉得我现在的方法不太对。我的代码如下：

def OpenBIPImage(file, width, height, numberOfBands):
    """
    Opens a raw image file in the BIP format and returns a list
    comprising each band as a separate PIL image.
    """
    bandArrays = []
    with open(file, 'rb') as imageFile:
        data = imageFile.read()
    currentPosition = 0
    for i in range(height * width):
        for j in range(numberOfBands):
            if i == 0:
                bandArrays.append(bytearray(data[currentPosition : currentPosition + 1]))
            else:
                bandArrays[j].extend(data[currentPosition : currentPosition + 1])
            currentPosition += 1
    bands = [Image.frombytes('L', (width, height), bytes(bandArray)) for bandArray in bandArrays]
    return bands

这段代码打开BIP文件的速度太慢了，肯定有更好的方法。我也有numpy和scipy这两个库，但我不太确定怎么用它们，或者它们是否能帮上忙。

由于图像的波段数量是可变的，我发现很难快速读取文件并把图像分成各个波段。

另外，我尝试过在循环中调整列表的方法（使用切片，不使用切片，仅使用append，仅使用extend等），但效果并不明显，因为主要的时间消耗在于迭代的次数 - （宽度 * 高度 * 波段数量）。

如果有任何建议或意见，那将非常有帮助。谢谢。

3 个回答

标准PIL

要从文件中加载一张图片，可以使用Image模块里的open函数。

>>> import Image
>>> im = Image.open("lena.ppm")

如果成功，这个函数会返回一个Image对象。你现在可以用这个对象的属性来查看文件的内容。

>>> print im.format, im.size, im.mode
PPM (512, 512) RGB

格式属性用来识别图片的来源。如果图片不是从文件读取的，这个属性会显示为None。大小属性是一个包含宽度和高度（以像素为单位）的二元组。模式属性定义了图片中带的数量和名称，还包括像素的类型和深度。常见的模式有“L”（亮度）用于灰度图像，“RGB”用于真彩色图像，以及“CMYK”用于印刷前的图像。

Python图像库还允许你处理多带图像的单独带，比如RGB图像。split方法会创建一组新图像，每个图像包含原始多带图像中的一条带。merge函数则需要一个模式和一组图像，将它们合并成一张新图像。下面的示例展示了如何交换RGB图像的三条带：

分离和合并带

r, g, b = im.split()
im = Image.merge("RGB", (b, g, r))

所以我认为你应该简单地推导出模式，然后相应地进行分离。

使用Spectral Python的PIL（SPy python模块）

不过，正如你在下面的评论中提到的，你并不是在处理一个普通的有3条带的RGB图像。为了处理这种情况，SpectralPython（一个纯Python模块，需要PIL）可能正是你需要的。

具体来说 - http://spectralpython.sourceforge.net/class_func_ref.html#spectral.io.bipfile.BipFile

spectral.io.bipfile.BipFile处理的是带交错像素（BIP）格式的图像文件。

希望这对你有帮助。

回答于 2025-04-17 由 Python大师

分享举报

我怀疑反复使用extend这个方法不好，最好先把所有东西都分配好。

def OpenBIPImage(file, width, height, numberOfBands):
    """
    Opens a raw image file in the BIP format and returns a list
    comprising each band as a separate PIL image.
    """
    bandArrays = []
    with open(file, 'rb') as imageFile:
        data = imageFile.read()
    currentPosition = 0
    for j in range(numberOfBands):
        bandArrays[j]= bytearray(b"\0"*(height * width)):


    for i in xrange(height * width):
        for j in xrange(numberOfBands):
                bandArrays[j][i]=data[currentPosition])
            currentPosition += 1
    bands = [Image.frombytes('L', (width, height), bytes(bandArray)) for bandArray in bandArrays]
    return bands

我的测量结果并没有显示出这么明显的慢下来。

def x():
    height,width,numberOfBands=1401,801,6
    before = time.time()
    for i in range(height * width):
        for j in range(numberOfBands):
            pass
    print (time.time()-before)

>>> x()
0.937999963760376

编辑过

回答于 2025-04-17 由 Python大师

分享举报

如果你能找到一个快速的函数来加载大量的二进制数据到一个大的 Python 列表（或者 numpy 数组）中，你可以使用切片的方式来分离这些数据：

band0 = biglist[::nbands]
band1 = biglist[1::nbands]
....

这样说清楚了吗？

回答于 2025-04-17 由 Python大师

分享举报

优化Python中读取卫星图像文件的性能

3 个回答

撰写回答