从一系列图像中找到最大x,y值

4 投票

1 回答

2238 浏览

数据工程师

提问于 2025-04-17 20:59

我有一堆位图图像（大约2000到4000张），我想对它们进行z投影的最大强度投影。也就是说，我需要从这堆图像中得到一个二维数组，里面存放每个x,y位置的最大值。

我写了一个简单的脚本，把文件分成小块，然后用multiprocessing.pool来计算每个小块的最大值数组。接着，这些数组会被比较，以找出整堆图像的最大值。

这个方法是有效的，但速度比较慢。我查看系统监控，发现我的CPU几乎没有在工作。

有没有人能给我一些建议，帮我加快这个过程呢？

import Image
import os
import numpy as np
import multiprocessing
import sys

#Get the stack of images
files = []
for fn in os.listdir(sys.argv[1]):
    if fn.endswith('.bmp'):
        files.append(os.path.join(sys.argv[1], fn))

def processChunk(filelist):
    first = True
    max_ = None
    for img in filelist:
        im = Image.open(img)
        array = np.array(im)
        if first:
            max_ = array
            first = False
        max_ = np.maximum(array, max_)
    return max_



if __name__ == '__main__':

    pool = multiprocessing.Pool(processes=8)

    #Chop list into chunks
    file_chunks = []
    chunk_size = 100
    ranges = range(0, len(files), chunk_size)

    for chunk_idx in ranges:
        file_chunks.append(files[chunk_idx:chunk_idx+chunk_size])

    #find the maximum x,y vals in chunks of 100
    first = True
    maxi = None
    max_arrays = pool.map(processChunk, file_chunks )

    #Find the maximums from the maximums returned from each process
    for array in max_arrays:
        if first:
            maxi = array
            first = False
        maxi = np.maximum(array, maxi)
    img = Image.fromarray(maxi)
    img.save("max_intensity.tif")

性能优化图像处理数组比较多进程计算最大强度投影 z投影位图分析

1 个回答

编辑：

我用一些样本数据做了小规模的测试，你说得对。而且，仔细看了你的代码后，我发现我原来的帖子大部分内容都是错的。你实际上做的迭代次数差不多（稍微多一点，但不是三倍）。我还发现

x = np.maximum(x, y)

比这两个都稍微快一点

x[y > x] = y[y > x]
#or
ind = y > x
x[ind] = y[ind]

所以我只会稍微改动你的代码，像这样：

import numpy as np
from multiprocessing import Pool

def process(chunk):
    max_ = np.zeros((4000, 4000))
    for im in chunk:
        im_array = np.array(Image.open(im))
        max_ = np.maximum(max_, im_array)
    return max_

if __name__ == "__main__":
    p = Pool(8)

    chunksize = 500 #4000/8 = 500, might have less overhead
    chunks = [files[i:i+chunksize]
              for i in range(0, len(files), chunksize)]

    # this returns an array of (len(files)/chunksize, 4000, 4000)
    max_arrays = np.array(p.map(process, chunks))
    maxi = np.amax(max_array, axis=0) #finds maximum along first axis
    img = Image.fromarray(maxi) #should be of shape (4000, 4000)

我觉得这是你能做到的最快的方法之一，虽然我有一个想法，可以用树形或比赛风格的算法，可能也可以用递归。做得不错。

这些图片有多大？小到可以一次加载两张图片到内存里吗？如果可以，那你能试试这样做：

maxi = np.zeros(image_shape) # something like (1024, 1024)

for im in files:
    im_array = np.array(Image.open(im))
    inds = im_array > maxi # find where image intensity > max intensity
    maxi[inds] = im_array[inds] # update the maximum value at each pixel 

max_im = Image.fromarray(maxi)
max_im.save("max_intensity.tif")

经过所有的迭代后，maxi 数组会包含每个 (x, y) 坐标的最大强度。没必要把它分成小块。而且，只有一个 for 循环，不是三个，所以会更快，可能也不需要多进程。

回答于 2025-04-17 由 Python大师

分享举报

从一系列图像中找到最大x,y值

1 个回答

撰写回答