为什么python在使用random generator检索图像路径时读取tiff图像的速度较慢

2024-04-25 19:40:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试编写一个python程序,从一个大样本tiff图像中随机读取一些tiff图像。有趣的是,我发现如果我们使用random generator生成索引并获取图像路径列表,python读取tiff图像(使用浮点值)的速度会慢得多,而使用硬代码随机索引来获取图像路径并读取tiff图像的速度会慢得多。你知道吗

import datetime
import matplotlib.pyplot as plt
import numpy

def read_in_seq(image_filenames, indices):
    return [ plt.imread(image_filenames[index]) for index in indices ]

image_filenames = []

for index in range(15000):
    image_filenames.append("/tmp/%05d" % index + ".tiff")

# This is generated from numpy.random.choice(15000, 100) but hard coded the values here
indices=[
  3885,   901,  6233,  7234, 10195,  2204,   469,  2906, 12114, 13515, 12977, 5201,
  8829, 11537,  5400,  9633, 10744, 12991,  2593,  3046,  5103,  1901,  8831, 12454,
  9779,  4714, 10839,  8702,  8537,  2136,  5095,  9006, 13293,  9933,  3584, 10818,
  8594, 11032,  3705,   435,  6679,  8349,  6930,  9741, 12933,  3231,  1849,  7871,
 11752,  8361,  3094,  2229, 14303,  2006,  5554,  1492, 14817, 12690, 10648, 14631,
  6401,  6181,  4401,  7222,  9881,  8381,  7603, 11374, 12702,  6881, 11868, 10967,
 14508, 12930,  3542,  1197,  8387, 11253,  1802, 14732,  7419, 11994,  6083,  8846,
  5370,  4276, 13953, 14409,  8197,  8956,  4717,  3262,  2314, 12527,  5394, 12495,
  6708,  9724,   740, 10416]

print(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f') + ": Normal input read started with size=" + str(len(indices)))
output = read_in_seq(image_filenames, indices) # takes 0.8 seconds
print(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f') + ": Normal input read completed with size=" + str(len(output)))

indices = numpy.random.choice(15000, 100)
print(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f') + ": Random input read started with size=" + str(len(indices)))
output = read_in_seq(image_filenames, indices) # takes ~3 seconds
print(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f') + ": Random input read completed with size=" + str(len(output)))

以下是输出:

2018-01-10 15:30:46.170487: Normal input read started with size=100
2018-01-10 15:30:46.943557: Normal input read completed with size=100
2018-01-10 15:30:46.943718: Random input read started with size=100
2018-01-10 15:30:49.858074: Random input read completed with size=100

所有15000个tiff图像都是相同的,每个约3MB。如您所见,使用硬编码随机索引读取15000个tiff图像中的100个tiff图像的正常输入只需要0.8秒。然而,当我们使用从随机生成器生成的索引(例如numpy.random)时,几乎需要3秒钟。你知道吗

另一方面,如果我们修改上面的代码,从15000个图像中读取100个png图像。使用硬编码随机生成的索引读取png图像的时间几乎与numpy.random生成的索引相同(大约4秒)。你知道吗

for index in range(15000):
    image_filenames.append("/tmp/%05d" % index + ".png")
----
2018-01-10 16:20:30.498341: Normal input read started with size=100
2018-01-10 16:20:34.020450: Normal input read completed with size=100
2018-01-10 16:20:34.020602: Random input read started with size=100
2018-01-10 16:20:38.692906: Random input read completed with size=100

请注意,读取tiff图像的时间度量值不计算numpy.random所花费的时间(仅计算读取图像read_in_seq的时间)。你知道吗

假设我们只能使用单线程,有人能解释一下为什么python在使用random generator检索图像路径时读取tiff图像的速度较慢(与检索图像路径的硬编码随机索引相比)?e、 g.是否与CPU浮点支持、硬盘寻道、操作系统设计或其他相关?你知道吗


Tags: in图像imagenumpyreadinputsizedatetime