<p>假设您知道最终数组<code>arr</code>永远不会大于5000x10。
然后,您可以预先分配一个最大大小的数组,并将数据填充为
通过循环,然后使用<code>arr.resize</code>将其缩减为
退出循环后发现大小。</p>
<p>下面的测试表明这样做比构建中间
不管数组的最终大小如何,python都会列出。</p>
<p>另外,<code>arr.resize</code>取消分配未使用的内存,因此最终(虽然可能不是中间)内存占用比<code>python_lists_to_array</code>使用的内存占用更小。</p>
<p>这表明<code>numpy_all_the_way</code>速度更快:</p>
<pre><code>% python -mtimeit -s"import test" "test.numpy_all_the_way(100)"
100 loops, best of 3: 1.78 msec per loop
% python -mtimeit -s"import test" "test.numpy_all_the_way(1000)"
100 loops, best of 3: 18.1 msec per loop
% python -mtimeit -s"import test" "test.numpy_all_the_way(5000)"
10 loops, best of 3: 90.4 msec per loop
% python -mtimeit -s"import test" "test.python_lists_to_array(100)"
1000 loops, best of 3: 1.97 msec per loop
% python -mtimeit -s"import test" "test.python_lists_to_array(1000)"
10 loops, best of 3: 20.3 msec per loop
% python -mtimeit -s"import test" "test.python_lists_to_array(5000)"
10 loops, best of 3: 101 msec per loop
</code></pre>
<p>这表明<code>numpy_all_the_way</code>使用更少的内存:</p>
<pre><code>% test.py
Initial memory usage: 19788
After python_lists_to_array: 20976
After numpy_all_the_way: 20348
</code></pre>
<p>测试.py:</p>
<pre><code>import numpy as np
import os
def memory_usage():
pid = os.getpid()
return next(line for line in open('/proc/%s/status' % pid).read().splitlines()
if line.startswith('VmSize')).split()[-2]
N, M = 5000, 10
def python_lists_to_array(k):
list_of_arrays = list(map(lambda x: x * np.ones(M), range(k)))
arr = np.array(list_of_arrays)
return arr
def numpy_all_the_way(k):
arr = np.empty((N, M))
for x in range(k):
arr[x] = x * np.ones(M)
arr.resize((k, M))
return arr
if __name__ == '__main__':
print('Initial memory usage: %s' % memory_usage())
arr = python_lists_to_array(5000)
print('After python_lists_to_array: %s' % memory_usage())
arr = numpy_all_the_way(5000)
print('After numpy_all_the_way: %s' % memory_usage())
</code></pre>