在Python中对多页TIFF进行平均处理

Question

怎样才能最快、最省内存地计算多个16位TIFF图像帧的平均值，并把结果存成numpy数组呢？

我目前想到的代码如下。让我惊讶的是，方法2比方法1要快。

不过，做性能测试的时候可不能只靠猜，得实际测试一下！所以，我想再试试其他方法。值得尝试一下Wand库吗？我没有把它放进来，因为我安装了ImageMagick-6.8.9-Q16和设置了MAGICK_HOME环境变量，但还是没法导入……还有其他适合处理多页TIFF的Python库吗？GDAL可能对我来说有点复杂。

（编辑）我加上了libtiff库。结果还是方法2最快，而且相当省内存。

from time import time

#import cv2  ## no multi page tiff support
import numpy as np
from PIL import Image
#from scipy.misc import imread  ## no multi page tiff support
import tifffile # http://www.lfd.uci.edu/~gohlke/code/tifffile.py.html
from libtiff import TIFF # https://code.google.com/p/pylibtiff/

fp = r"path/2/1000frames-timelapse-image.tif"

def method1(fp):
    '''
    using tifffile.py by Christoph (Version: 2014.02.05)
    (http://www.lfd.uci.edu/~gohlke/code/tifffile.py.html)
    '''
    with tifffile.TIFFfile(fp) as imfile:
        return imfile.asarray().mean(axis=0)


def method2(fp):
    'primitive peak memory friendly way with tifffile.py'
    with tifffile.TIFFfile(fp) as imfile:

        nframe, h, w = imfile.series[0]['shape']
        temp = np.zeros( (h,w), dtype=np.float64 )

        for n in range(nframe):
            curframe = imfile.asarray(n)
            temp += curframe

        return (temp / nframe)


def method3(fp):
    ' like method2 but using pillow 2.3.0 '
    im = Image.open(fp)

    w, h = im.size
    temp = np.zeros( (h,w), dtype=np.float64 )

    n = 0
    while True:
        curframe = np.array(im.getdata()).reshape(h,w)
        temp += curframe
        n += 1
        try:
            im.seek(n)
        except:
            break

    return (temp / n)


def method4(fp):
    '''
    https://code.google.com/p/pylibtiff/
    documentaion seems out dated.
    '''

    tif = TIFF.open(fp)
    header = tif.info()

    meta = dict()  # extracting meta
    for l in header.splitlines():
        if l:
            if l.find(':')>0:
                parts = l.split(':')
                key = parts[0]
                value = ':'.join(parts[1:])
            elif l.find('=')>0:
                key, value =l.split('=')
            meta[key] = value    

    nframes = int(meta['frames'])
    h = int(meta['ImageLength'])
    w = int(meta['ImageWidth'])

    temp = np.zeros( (h,w), dtype=np.float64 )

    for frame in tif.iter_images():
        temp += frame

    return (temp / nframes)

t0 = time()
avgimg1 = method1(fp)
print time() - t0
# 1.17-1.33 s

t0 = time()
avgimg2 = method2(fp)
print time() - t0
# 0.90-1.53 s  usually faster than method1 by 20%

t0 = time()
avgimg3 = method3(fp)
print time() - t0
# 21 s

t0 = time()
avgimg4 = method4(fp)
print time() - t0
# 1.96 - 2.21 s  # may not be accurate. I got warning for every frame with the tiff file I tested.

np.testing.assert_allclose(avgimg1, avgimg2)
np.testing.assert_allclose(avgimg1, avgimg3)
np.testing.assert_allclose(avgimg1, avgimg4)

性能测试图像处理 numpy数组 libtiff wand库 tiff处理图像平均 gdald

在Python中对多页TIFF进行平均处理

1 个回答

撰写回答