在Python中对多页TIFF进行平均处理
怎样才能最快、最省内存地计算多个16位TIFF图像帧的平均值,并把结果存成numpy数组呢?
我目前想到的代码如下。让我惊讶的是,方法2比方法1要快。
不过,做性能测试的时候可不能只靠猜,得实际测试一下!所以,我想再试试其他方法。值得尝试一下Wand库吗?我没有把它放进来,因为我安装了ImageMagick-6.8.9-Q16和设置了MAGICK_HOME环境变量,但还是没法导入……还有其他适合处理多页TIFF的Python库吗?GDAL可能对我来说有点复杂。
(编辑)我加上了libtiff库。结果还是方法2最快,而且相当省内存。
from time import time
#import cv2 ## no multi page tiff support
import numpy as np
from PIL import Image
#from scipy.misc import imread ## no multi page tiff support
import tifffile # http://www.lfd.uci.edu/~gohlke/code/tifffile.py.html
from libtiff import TIFF # https://code.google.com/p/pylibtiff/
fp = r"path/2/1000frames-timelapse-image.tif"
def method1(fp):
'''
using tifffile.py by Christoph (Version: 2014.02.05)
(http://www.lfd.uci.edu/~gohlke/code/tifffile.py.html)
'''
with tifffile.TIFFfile(fp) as imfile:
return imfile.asarray().mean(axis=0)
def method2(fp):
'primitive peak memory friendly way with tifffile.py'
with tifffile.TIFFfile(fp) as imfile:
nframe, h, w = imfile.series[0]['shape']
temp = np.zeros( (h,w), dtype=np.float64 )
for n in range(nframe):
curframe = imfile.asarray(n)
temp += curframe
return (temp / nframe)
def method3(fp):
' like method2 but using pillow 2.3.0 '
im = Image.open(fp)
w, h = im.size
temp = np.zeros( (h,w), dtype=np.float64 )
n = 0
while True:
curframe = np.array(im.getdata()).reshape(h,w)
temp += curframe
n += 1
try:
im.seek(n)
except:
break
return (temp / n)
def method4(fp):
'''
https://code.google.com/p/pylibtiff/
documentaion seems out dated.
'''
tif = TIFF.open(fp)
header = tif.info()
meta = dict() # extracting meta
for l in header.splitlines():
if l:
if l.find(':')>0:
parts = l.split(':')
key = parts[0]
value = ':'.join(parts[1:])
elif l.find('=')>0:
key, value =l.split('=')
meta[key] = value
nframes = int(meta['frames'])
h = int(meta['ImageLength'])
w = int(meta['ImageWidth'])
temp = np.zeros( (h,w), dtype=np.float64 )
for frame in tif.iter_images():
temp += frame
return (temp / nframes)
t0 = time()
avgimg1 = method1(fp)
print time() - t0
# 1.17-1.33 s
t0 = time()
avgimg2 = method2(fp)
print time() - t0
# 0.90-1.53 s usually faster than method1 by 20%
t0 = time()
avgimg3 = method3(fp)
print time() - t0
# 21 s
t0 = time()
avgimg4 = method4(fp)
print time() - t0
# 1.96 - 2.21 s # may not be accurate. I got warning for every frame with the tiff file I tested.
np.testing.assert_allclose(avgimg1, avgimg2)
np.testing.assert_allclose(avgimg1, avgimg3)
np.testing.assert_allclose(avgimg1, avgimg4)
1 个回答
-1
简单的逻辑让我觉得方法1或方法3更靠谱,因为方法2和方法4里面有for循环。for循环在处理更多输入时,通常会让你的代码变得更慢。
我肯定会选择方法1:整洁,容易理解……
如果想要更确定的话,我建议你去测试一下这些方法。如果你不想测试,我会选择方法1。
祝好,