在Python中通过迭代在numpy/scipy中构建数组？

26 投票

4 回答

50989 浏览

提问于 2025-04-15 21:37

我经常需要通过遍历一些数据来创建一个数组，比如：

my_array = []
for n in range(1000):
  # do operation, get value 
  my_array.append(value)
# cast to array
my_array = array(my_array)

我发现我必须先创建一个列表，然后再用“array”把它转换成数组。有没有什么办法可以避免这些步骤？这些转换的代码让我的代码看起来很乱……我该如何从一开始就直接创建“my_array”，让它就是一个数组呢？

4 个回答

推荐的做法是在循环之前先分配好内存，然后通过切片和索引来插入数据。

my_array = numpy.zeros(1,1000)
for i in xrange(1000):
    #for 1D array
    my_array[i] = functionToGetValue(i)
    #OR to fill an entire row
    my_array[i:] = functionToGetValue(i)
    #or to fill an entire column
    my_array[:,i] = functionToGetValue(i)

虽然numpy提供了一个array.resize()的方法，但在循环中使用这个方法会很慢，因为每次都要重新分配内存。如果你真的需要灵活性，那就只能从一个list创建一个array了。

补充一下：如果你担心为数据分配了太多内存，可以使用上面的方法先多分配一些，然后在循环结束后，用array.resize()去掉未使用的部分。这样做会比在循环中不断重新分配数组快得多。

补充：针对@user248237的评论，假设你知道数组的某一维度（为了简单起见）：

my_array = numpy.array(10000, SOMECONSTANT)

for i in xrange(someVariable):
    if i >= my_array.shape[0]:
        my_array.resize((my_array.shape[0]*2, SOMECONSTANT))

    my_array[i:] = someFunction()

#lop off extra bits with resize() here

一般原则是“分配比你认为需要的更多的内存，如果情况变化，尽量少调整数组的大小”。虽然把大小翻倍可能看起来有点过，但实际上这是许多其他语言的标准库中几种数据结构使用的方法（例如，java.util.Vector默认就是这样做的。我觉得C++中的几个std::vector的实现也是这样）。

回答于 2025-04-15 由 Python大师

分享举报

NumPy 提供了一个叫 'fromiter' 的方法：

def myfunc(n):
    for i in range(n):
        yield i**2


np.fromiter(myfunc(5), dtype=int)

这个方法会生成

array([ 0,  1,  4,  9, 16])

回答于 2025-04-15 由 Python大师

分享举报

-2

如果我理解你的问题没错的话，这段代码应该能满足你的需求：

# the array passed into your function
ax = NP.random.randint(10, 99, 20).reshape(5, 4)

# just define a function to operate on some data
fnx = lambda x : NP.sum(x)**2

# apply the function directly to the numpy array
new_row = NP.apply_along_axis(func1d=fnx, axis=0, arr=ax)

# 'append' the new values to the original array
new_row = new_row.reshape(1,4)
ax = NP.vstack((ax, new_row))

回答于 2025-04-15 由 Python大师

分享举报

在Python中通过迭代在numpy/scipy中构建数组？

4 个回答

撰写回答