优化数组过滤器的内存使用以避免块处理

3 投票

3 回答

1193 浏览

提问于 2025-04-16 11:34

我正在实现一些卫星图像的滤波器，首先是一个叫做增强李滤波器的。图像的大小可以达到5000x5000像素甚至更大。现在我的实现遇到了内存不足的问题，因为在处理这些大数组时，计算滤波器会耗尽内存（注意，移动平均和移动标准差滤波器可以一次性运行）。主要的困难在于必须在内存中保留多个数组，以便返回最终的滤波结果。之前我在这个问题中请求帮助，想要优化一个分块处理的函数，但我想问的是：有没有办法改进这段代码，让我不需要使用分块处理？

    def moving_average(Ic, filtsize):
        Im = numpy.empty(Ic.shape, dtype='Float32')
        scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
        return Im

    def moving_stddev(Ic, filtsize):
        Im = numpy.empty(Ic.shape, dtype='Float32')
        scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
        S = numpy.empty(Ic.shape, dtype='Float32')
        scipy.ndimage.filters.uniform_filter(((Ic-Im) ** 2), filtsize, output=S)
        return numpy.sqrt(S)

    def enh_lee(Ic, filtsize, nlooks, dfactor):
        # Implementation based on PCI Geomatica's FELEE function documentation
        Ci = moving_stddev(Ic, filtsize) / moving_average(Ic, filtsize) #1st array in memory
        Cu = numpy.sqrt(1 / nlooks) #scalar
        Cmax = numpy.sqrt(1 + (2 * nlooks)) #scalar
        W = numpy.exp(-dfactor * (Ci - Cu) / (Cmax - Ci)) #2nd array in memory
        Im = moving_average(Ic, filtsize) #3rd array in memory
        If = Im * W + Ic * (1 - W) #4th array in memory
        W = None # Back to 3 arrays in memory
        return numpy.select([Ci <= Cu, (Cu < Ci) * (Ci < Cmax), Ci >= Cmax], [Im, If, Ic])

其中 nlooks 和 dfactor 是标量，Ic 是未经过滤的数组。

根据你们的建议（我也在考虑 numexpr），我对 enh_lee 的改进代码如下，但仍然无法在最后一步不耗尽内存：

def enh_lee(Ic, filtsize, nlooks, dfactor):
    Im = moving_average(Ic, filtsize)
    Ci = moving_stddev(Ic, filtsize)
    Ci /= Im
    Cu = numpy.sqrt(1 / nlooks)
    Cmax = numpy.sqrt(1 + (2 * nlooks))

    W = Ci
    W -= Cu
    W /= Cmax - Ci
    W *= -dfactor
    numpy.exp(W, W)

    If = 1
    If -= W
    If *= Ic
    If += Im * W
    W = None
    return numpy.select([Ci <= Cu, (Cu < Ci) & (Ci < Cmax), Ci >= Cmax], [Im, If, Ic])

内存管理数据处理图像处理数组优化移动平均滤波器设计卫星图像 numexpr

3 个回答

作为一个减少内存使用的例子，我们来看看你的 moving_stddev() 函数。像这样的表达式：

((Ic-Im) ** 2)

会创建临时数组，但其实可以避免这些：

def moving_stddev(Ic, filtsize):
    Im = numpy.empty(Ic.shape, dtype='Float32')
    scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
    Im -= Ic
    Im **= 2
    scipy.ndimage.filters.uniform_filter(Im, filtsize, output=Im)
    numpy.sqrt(Im, Im)
    return Im

需要注意的是，numpy.sqrt() 和 scipy.ndimage.filters.uniform_filter() 可以使用同一个输入和输出数组，这样是没问题的。不过要小心使用这些方法，因为有时候会产生意想不到的副作用。

这里有一个很好的回答，来自 Joe Kington，详细介绍了更多节省内存的方法。

回答于 2025-04-16 由 Python大师

分享举报

你在内存中计算你的数组，但实际上它们的数量更多。可以看看 numexpr，首页上的文字会告诉你发生了什么，这个工具可能会帮助你解决你的问题。

回答于 2025-04-16 由 Python大师

分享举报

这里有几个可以优化内存使用的方法... 有几个小技巧需要记住：

大多数numpy函数都有一个out参数，可以用来指定输出数组，而不是返回一个副本。例如，np.sqrt(x, x)会直接在原数组上计算平方根。
x += 1使用的内存只有x = x + 1的一半，因为后者会创建一个临时副本。尽量将计算拆分成*=、+=、/=等形式。如果不行，可以使用numexpr，正如@eumiro建议的那样。（或者无论如何都用numexpr... 在很多情况下都很方便。）

首先，这里是你原始函数在处理一个10000x10000的随机数据数组和filtsize为3时的性能表现：

原始函数的内存使用情况 enter image description here

有趣的是，最后出现了几个大峰值。这些峰值出现在你的numpy.select(...)部分。在很多地方，你无意中创建了额外的临时数组，但这些大部分都不重要，因为它们被select调用时的情况所掩盖。

无论如何，如果我们用这个相对冗长的版本替换你原来的（简洁明了的）代码，你的内存使用情况可以显著优化：

import numpy
import scipy.ndimage

def main(x=None):
    if x is None:
        ni, nj = 10000, 10000
        x = numpy.arange(ni*nj, dtype=numpy.float32).reshape(ni,nj)
    filtsize = 3
    nlooks = 10.0
    dfactor = 10.0
    x = enh_lee(x, filtsize, nlooks, dfactor)
    return x

def moving_average(Ic, filtsize):
    Im = numpy.empty(Ic.shape, dtype='Float32')
    scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
    return Im

def moving_stddev(Ic, filtsize):
    Im = numpy.empty(Ic.shape, dtype='Float32')
    scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
    Im *= -1
    Im += Ic
    Im **= 2
    scipy.ndimage.filters.uniform_filter(Im, filtsize, output=Im)
    return numpy.sqrt(Im, Im)

def enh_lee(Ic, filtsize, nlooks, dfactor):
    # Implementation based on PCI Geomatica's FELEE function documentation
    Ci = moving_stddev(Ic, filtsize) 
    Im = moving_average(Ic, filtsize) 
    Ci /= Im

    Cu = numpy.sqrt(1 / nlooks).astype(numpy.float32) 
    Cmax = numpy.sqrt(1 + (2 * nlooks)).astype(numpy.float32) 

    W = Ci.copy()
    W -= Cu
    W *= -dfactor
    W /= Cmax - Ci
    W = numpy.exp(W, W)

    If = Im * W
    W *= -1
    W += 1
    W *= Ic
    If += W
    del W

    # Replace the call to numpy.select
    out = If
    filter = Ci <= Cu
    numpy.putmask(out, filter, Im)
    del Im

    filter = Ci >= Cmax
    numpy.putmask(out, filter, Ic)
    return out

if __name__ == '__main__':
    main()

这是这个代码的内存使用情况：

基于Numpy的优化版本内存使用情况 enter image description here

所以，我们大大减少了内存使用，但代码的可读性稍差（个人认为）。

~~不过，最后三个峰值是两个numpy.where的调用...~~

如果numpy.where支持out参数，我们可以进一步减少大约300Mb的峰值内存使用。不幸的是，它并不支持，我也不知道有什么更节省内存的方法...

我们可以用numpy.putmask替代numpy.select的调用，并在原地进行操作（感谢@eumiro在一个完全不同的问题中提到这一点）。

如果我们用numexpr来优化，代码会变得相对干净（与上面的纯numpy版本相比，不是与原始版本）。你可能还可以在这个版本中进一步减少内存使用... 我对numexpr不太熟悉，只是用过几次。

import numpy
import scipy.ndimage
import numexpr as ne

def main(x=None):
    if x is None:
        ni, nj = 10000, 10000
        x = numpy.arange(ni*nj, dtype=numpy.float32).reshape(ni,nj)
    filtsize = 3
    nlooks = 10.0
    dfactor = 10.0
    x = enh_lee(x, filtsize, nlooks, dfactor)
    return x

def moving_average(Ic, filtsize):
    Im = numpy.empty(Ic.shape, dtype='Float32')
    scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
    return Im

def moving_stddev(Ic, filtsize):
    Im = numpy.empty(Ic.shape, dtype='Float32')
    scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
    Im = ne.evaluate('((Ic-Im) ** 2)')
    scipy.ndimage.filters.uniform_filter(Im, filtsize, output=Im)
    return ne.evaluate('sqrt(Im)')

def enh_lee(Ic, filtsize, nlooks, dfactor):
    # Implementation based on PCI Geomatica's FELEE function documentation
    Ci = moving_stddev(Ic, filtsize) 
    Im = moving_average(Ic, filtsize) 
    Ci /= Im

    Cu = numpy.sqrt(1 / nlooks).astype(numpy.float32) 
    Cmax = numpy.sqrt(1 + (2 * nlooks)).astype(numpy.float32) 

    W = ne.evaluate('exp(-dfactor * (Ci - Cu) / (Cmax - Ci))') 
    If = ne.evaluate('Im * W + Ic * (1 - W)') 
    del W

    out = ne.evaluate('where(Ci <= Cu, Im, If)') 
    del Im
    del If

    out = ne.evaluate('where(Ci >= Cmax, Ic, out)')
    return out

if __name__ == '__main__':
    main()

这是numexpr版本的内存使用情况：（注意，执行时间比原始版本减少了一半以上！）

基于Numexpr的优化版本内存使用情况* enter image description here

最大的内存使用仍然是在调用where时（替代了select的调用）。不过，峰值内存使用显著减少。进一步减少内存使用的最简单方法是找到某种方式让select在一个数组上原地操作。用cython来做这件事会比较简单（在纯python中嵌套循环会很慢，而numpy中的任何布尔索引都会创建一个额外的副本）。不过，你可能还是继续像之前那样分块处理输入数组更好...

顺便提一下，更新后的版本与原始代码的输出是一样的。原始的基于numpy的代码中有一个拼写错误...

回答于 2025-04-16 由 Python大师

分享举报

优化数组过滤器的内存使用以避免块处理

3 个回答

撰写回答