从numpy数组列表创建numpy数组的Pythonic方法

>>> list_of_arrays = map(lambda x: x*ones(2), range(5)) >>> list_of_arrays [array([ 0., 0.]), array([ 1., 1.]), array([ 2., 2.]), array([ 3., 3.]), array([ 4., 4.])] >>> arr = array(list_of_arrays) >>> arr array([[ 0., 0.], [ 1., 1.], [ 2., 2.], [ 3., 3.], [ 4., 4.]])

3条回答

网友

1楼 · 编辑于 2024-05-14 04:21:50

假设您知道最终数组arr永远不会大于5000x10。然后，您可以预先分配一个最大大小的数组，并将数据填充为通过循环，然后使用arr.resize将其缩减为退出循环后发现大小。

下面的测试表明这样做比构建中间不管数组的最终大小如何，python都会列出。

另外，arr.resize取消分配未使用的内存，因此最终（虽然可能不是中间）内存占用比python_lists_to_array使用的内存占用更小。

这表明numpy_all_the_way速度更快：

% python -mtimeit -s"import test" "test.numpy_all_the_way(100)"
100 loops, best of 3: 1.78 msec per loop
% python -mtimeit -s"import test" "test.numpy_all_the_way(1000)"
100 loops, best of 3: 18.1 msec per loop
% python -mtimeit -s"import test" "test.numpy_all_the_way(5000)"
10 loops, best of 3: 90.4 msec per loop

% python -mtimeit -s"import test" "test.python_lists_to_array(100)"
1000 loops, best of 3: 1.97 msec per loop
% python -mtimeit -s"import test" "test.python_lists_to_array(1000)"
10 loops, best of 3: 20.3 msec per loop
% python -mtimeit -s"import test" "test.python_lists_to_array(5000)"
10 loops, best of 3: 101 msec per loop

这表明numpy_all_the_way使用更少的内存：

% test.py
Initial memory usage: 19788
After python_lists_to_array: 20976
After numpy_all_the_way: 20348

测试.py：

import numpy as np
import os


def memory_usage():
    pid = os.getpid()
    return next(line for line in open('/proc/%s/status' % pid).read().splitlines()
                if line.startswith('VmSize')).split()[-2]

N, M = 5000, 10


def python_lists_to_array(k):
    list_of_arrays = list(map(lambda x: x * np.ones(M), range(k)))
    arr = np.array(list_of_arrays)
    return arr


def numpy_all_the_way(k):
    arr = np.empty((N, M))
    for x in range(k):
        arr[x] = x * np.ones(M)
    arr.resize((k, M))
    return arr

if __name__ == '__main__':
    print('Initial memory usage: %s' % memory_usage())
    arr = python_lists_to_array(5000)
    print('After python_lists_to_array: %s' % memory_usage())
    arr = numpy_all_the_way(5000)
    print('After numpy_all_the_way: %s' % memory_usage())

网友

2楼 · 编辑于 2024-05-14 04:21:50

甚至比@Gill Bates的答案还要简单，这里有一行代码：

np.stack(list_of_arrays, axis=0)

网友

3楼 · 编辑于 2024-05-14 04:21:50

方便的方式，使用^{}。我相信这比@unutbu的答案还要快：

In [32]: import numpy as np 

In [33]: list_of_arrays = list(map(lambda x: x * np.ones(2), range(5)))

In [34]: list_of_arrays
Out[34]: 
[array([ 0.,  0.]),
 array([ 1.,  1.]),
 array([ 2.,  2.]),
 array([ 3.,  3.]),
 array([ 4.,  4.])]

In [37]: shape = list(list_of_arrays[0].shape)

In [38]: shape
Out[38]: [2]

In [39]: shape[:0] = [len(list_of_arrays)]

In [40]: shape
Out[40]: [5, 2]

In [41]: arr = np.concatenate(list_of_arrays).reshape(shape)

In [42]: arr
Out[42]: 
array([[ 0.,  0.],
       [ 1.,  1.],
       [ 2.,  2.],
       [ 3.,  3.],
       [ 4.,  4.]])

相关问题更多 >

编程相关推荐

热门问题

热门文章