在Python中对多页TIFF进行平均处理

5 投票
1 回答
2866 浏览
提问于 2025-04-18 06:21

怎样才能最快、最省内存地计算多个16位TIFF图像帧的平均值,并把结果存成numpy数组呢?

我目前想到的代码如下。让我惊讶的是,方法2比方法1要快。

不过,做性能测试的时候可不能只靠猜,得实际测试一下!所以,我想再试试其他方法。值得尝试一下Wand库吗?我没有把它放进来,因为我安装了ImageMagick-6.8.9-Q16和设置了MAGICK_HOME环境变量,但还是没法导入……还有其他适合处理多页TIFF的Python库吗?GDAL可能对我来说有点复杂。

(编辑)我加上了libtiff库。结果还是方法2最快,而且相当省内存。

from time import time

#import cv2  ## no multi page tiff support
import numpy as np
from PIL import Image
#from scipy.misc import imread  ## no multi page tiff support
import tifffile # http://www.lfd.uci.edu/~gohlke/code/tifffile.py.html
from libtiff import TIFF # https://code.google.com/p/pylibtiff/

fp = r"path/2/1000frames-timelapse-image.tif"

def method1(fp):
    '''
    using tifffile.py by Christoph (Version: 2014.02.05)
    (http://www.lfd.uci.edu/~gohlke/code/tifffile.py.html)
    '''
    with tifffile.TIFFfile(fp) as imfile:
        return imfile.asarray().mean(axis=0)


def method2(fp):
    'primitive peak memory friendly way with tifffile.py'
    with tifffile.TIFFfile(fp) as imfile:

        nframe, h, w = imfile.series[0]['shape']
        temp = np.zeros( (h,w), dtype=np.float64 )

        for n in range(nframe):
            curframe = imfile.asarray(n)
            temp += curframe

        return (temp / nframe)


def method3(fp):
    ' like method2 but using pillow 2.3.0 '
    im = Image.open(fp)

    w, h = im.size
    temp = np.zeros( (h,w), dtype=np.float64 )

    n = 0
    while True:
        curframe = np.array(im.getdata()).reshape(h,w)
        temp += curframe
        n += 1
        try:
            im.seek(n)
        except:
            break

    return (temp / n)


def method4(fp):
    '''
    https://code.google.com/p/pylibtiff/
    documentaion seems out dated.
    '''

    tif = TIFF.open(fp)
    header = tif.info()

    meta = dict()  # extracting meta
    for l in header.splitlines():
        if l:
            if l.find(':')>0:
                parts = l.split(':')
                key = parts[0]
                value = ':'.join(parts[1:])
            elif l.find('=')>0:
                key, value =l.split('=')
            meta[key] = value    

    nframes = int(meta['frames'])
    h = int(meta['ImageLength'])
    w = int(meta['ImageWidth'])

    temp = np.zeros( (h,w), dtype=np.float64 )

    for frame in tif.iter_images():
        temp += frame

    return (temp / nframes)

t0 = time()
avgimg1 = method1(fp)
print time() - t0
# 1.17-1.33 s

t0 = time()
avgimg2 = method2(fp)
print time() - t0
# 0.90-1.53 s  usually faster than method1 by 20%

t0 = time()
avgimg3 = method3(fp)
print time() - t0
# 21 s

t0 = time()
avgimg4 = method4(fp)
print time() - t0
# 1.96 - 2.21 s  # may not be accurate. I got warning for every frame with the tiff file I tested.

np.testing.assert_allclose(avgimg1, avgimg2)
np.testing.assert_allclose(avgimg1, avgimg3)
np.testing.assert_allclose(avgimg1, avgimg4)

1 个回答

-1

简单的逻辑让我觉得方法1或方法3更靠谱,因为方法2和方法4里面有for循环。for循环在处理更多输入时,通常会让你的代码变得更慢。

我肯定会选择方法1:整洁,容易理解……

如果想要更确定的话,我建议你去测试一下这些方法。如果你不想测试,我会选择方法1。

祝好,

撰写回答