为什么使用numpy.save从数组中保存切片的速度取决于切片的方向？

import os import numpy as np import time import matplotlib.pyplot as plt # take a slice of the data def slice_data(roi): dic = {} data = np.zeros((512,512,256)) dic['data'] = np.squeeze( data[roi[0]:roi[1]+1, roi[2]:roi[3]+1, roi[4]:roi[5]+1] ) return dic # save slices if the data def save_slices(roi, save=False): var = 'data' for i in range(0,6): # iterate to simulate a time series of data a = slice_data(roi)[var] var_dir = 'save_test/' if not os.path.exists(var_dir): os.makedirs(var_dir) file = var_dir + '{0:04d}{1}'.format(i,'.npy') if save is True: np.save(file, a) ## define slices roix=[256, 256, 0, 512, 0, 256] # yz plane slice roiy=[0, 512, 256, 256, 0, 256] # xz plane slice roiz=[0, 512, 0, 512, 128, 128] # xy plane slice ## Calculate slices and do not save the results dtx = [] dty = [] dtz = [] for i in range(100): time0 = time.time() save_slices(roix) time1 = time.time() dtx.append(time1-time0) time0 = time.time() save_slices(roiy) time1 = time.time() dty.append(time1-time0) time0 = time.time() save_slices(roiz) time1 = time.time() dtz.append(time1-time0) plt.figure(1) plt.plot(dtx) plt.plot(dty) plt.plot(dtz) plt.title('time to run code without saving data') print('mean time x-slice: {} sec'.format(np.mean(dtx))) print('mean time y-slice: {} sec'.format(np.mean(dty))) print('mean time z-slice: {} sec'.format(np.mean(dtz))) ## Calculate slices and do save the results dtx = [] dty = [] dtz = [] for i in range(100): time0 = time.time() save_slices(roix, save=True) time1 = time.time() dtx.append(time1-time0) time0 = time.time() save_slices(roiy, save=True) time1 = time.time() dty.append(time1-time0) time0 = time.time() save_slices(roiz, save=True) time1 = time.time() dtz.append(time1-time0) plt.figure(2) plt.plot(dtx) plt.plot(dty) plt.plot(dtz) plt.title('time to run code and save data') print('mean time x-slice: {} sec'.format(np.mean(dtx))) print('mean time y-slice: {} sec'.format(np.mean(dty))) print('mean time z-slice: {} sec'.format(np.mean(dtz)))

2条回答

网友

1楼 · 编辑于 2024-05-23 14:20:44

原因是Numpy默认情况下按行的主要顺序存储数据。如果你改变了

data = np.zeros((512,512,256))

至

# order F means column major
data = np.zeros((512,512,256), order='F')

你会发现保存X光片需要最长的时间。你知道吗

当你保存数组的多个片段时，确保能更好的保存数组的性能。更详细的解释如下。你知道吗

让我们以以下矩阵为例（来自Numpy glossary）：

m = [[1, 2, 3],
     [4, 5, 6]]

如果这在内存中以行大顺序（numpy行话中的C顺序）表示，它的布局如下：

[1, 2, 3, 4, 5, 6]

如果矩阵在内存中以列的主次顺序表示（或F表示Fortran顺序），则其布局如下：

[1, 4, 2, 5, 3, 6]

现在如果你用m[:,2]索引这个数组，你得到[3, 6]，用m[1,:]，你得到[4, 5, 6]。如果回顾内存布局，您将看到值[3, 6]在列主表示中是连续的，而[4, 5, 6]在行主表示中是连续的。你知道吗

当从一个数组中读取大量元素时（如保存一个元素时），连续读取这些值的性能要高得多，因为这样可以利用CPU缓存，它比从内存中读取要大1-2个数量级。你知道吗

网友

2楼 · 编辑于 2024-05-23 14:20:44

简短的回答

只有roix数组是连续的。因此，使用总线从内存到CPU的传输比不连续数据的传输要快（这是因为总线在块中移动数据并缓存它们）

通过使其C连续np.save(file, np.asarray(a, order='C'))，您可以有一个小的改进（roiz大约5%，roiy大约40%）

剖析

您应该使用timeit来计时您的性能，而不是自定义方法。你知道吗

我为你做了这些来展示一个例子：

在我们的牢房里：

import os
import numpy as np
import time
import matplotlib.pyplot as plt

# take a slice of the data
def slice_data(roi):
    dic = {}
    data = np.zeros((512,512,256))
    dic['data'] = np.squeeze( data[roi[0]:roi[1]+1, roi[2]:roi[3]+1, roi[4]:roi[5]+1] )
    return dic


# save slices if the data
def save_slices(roi, save=False):
    var = 'data'
    for i in range(0,6):
                # iterate to simulate a time series of data
        a = slice_data(roi)[var]
        var_dir = 'save_test/'
        if not os.path.exists(var_dir): os.makedirs(var_dir)
        file = var_dir + '{0:04d}{1}'.format(i,'.npy')

        if save is True:
            np.save(file, a)


## define slices
roix=[256, 256, 0, 512, 0, 256] # yz plane slice
roiy=[0, 512, 256, 256, 0, 256] # xz plane slice
roiz=[0, 512, 0, 512, 128, 128] # xy plane slice

在其他方面：

%%timeit -n 100
save_slices(roix) # 19.8 ms ± 285 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit -n 100
save_slices(roiy) # 20.5 ms ± 948 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit -n 100
save_slices(roiz) # 20 ms ± 345 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

有了拯救

%%timeit -n 10 -r 3
save_slices(roix, True) # 32.7 ms ± 2.31 ms per loop (mean ± std. dev. of 3 runs, 10 loops each)

%%timeit -n 10 -r 3
save_slices(roiy, True) # 101 ms ± 2.61 ms per loop (mean ± std. dev. of 3 runs, 10 loops each)

%%timeit -n 10 -r 3
save_slices(roix, True) # 1.9 s ± 21.1 ms per loop (mean ± std. dev. of 3 runs, 10 loops each)

你已经注意到了，你已经注意到了！让我们进入np.save()方法

你知道吗Np.保存方法

np.save负责io流，并调用write_array方法。这对于Cèu连续阵列来说非常快。（快速存取存储器）

让我们来验证这个假设：

np.squeeze( np.zeros((512,512,256))[roix[0]:roix[1]+1, roix[2]:roix[3]+1, roix[4]:roix[5]+1] ).flags.c_contiguous # returns True

np.squeeze( np.zeros((512,512,256))[roiy[0]:roiy[1]+1, roiy[2]:roiy[3]+1, roiy[4]:roiy[5]+1] ).flags.c_contiguous # returns False

np.squeeze( np.zeros((512,512,256))[roiz[0]:roiz[1]+1, roiz[2]:roiz[3]+1, roiz[4]:roiz[5]+1] ).flags.c_contiguous # returns False

所以这可能解释了roix和roiy/roiz之间的区别。你知道吗

对`roiy`和`roiz`之间差异的潜在解释。数据传输速度减慢

在那之后，我只能做出假设，roiz似乎比roiy更零碎。对于write_array方法，这需要很多时间。你知道吗

我现在不能自己测试这个，但是这个部分可以在linux中使用perf命令进行验证。（要查看使用的时间总线的数量，例如缓存未命中的数量）。如果我不得不胡乱猜测的话，我会说缓存未命中率很高，因为数据不是连续的。因此，将数据从RAM传输到CPU确实会减慢进程。你知道吗

其他处理存储的方法

我没有试过，但是有一个很好的问题和一些有用的答案：best way to preserve numpy arrays on disk

简短的回答

更多解释

剖析

你知道吗Np.保存方法

对`roiy`和`roiz`之间差异的潜在解释。数据传输速度减慢

其他处理存储的方法

相关问题更多 >

编程相关推荐

热门问题

热门文章