使用+=操作符在矩阵过大时会出现内存错误

Z = np.where(np.random.multinomial(1,[1./ntopics]*ntopics,size = M*N )==1)[1] Z array([[1, 3, 0, ..., 5, 3, 1], [3, 5, 0, ..., 5, 1, 2], [4, 5, 4, ..., 1, 3, 5], ..., [1, 2, 1, ..., 0, 3, 4], [0, 5, 2, ..., 2, 5, 0], [2, 3, 2, ..., 4, 1, 5]])

# Option 1 for m in xrange(M): NZM[Z_index,m] += 1 # Option 2 NZM[Z_index,:] += 1 --------------------------------------------------------------------------- MemoryError Traceback (most recent call last) <ipython-input-88-087ab1ede05d> in <module>() 2 # a memory error 3 ----> 4 NZM[Z_index,:] += 1 MemoryError:

1条回答

网友

1楼 · 发布于 2024-06-02 08:08:28

我的问题是问题here的重复，但是它来自于一个我认为是唯一的查询，人们在搜索由大量重复索引引起的错误时会更容易找到它。在

所以，一个简单的健全性检查表明，它并没有做我认为它在做的事情。我假设，给定一个具有同一行的倍数的索引，+=将在索引中每出现一行时再向这些行添加一个。在

import numpy as np
import pandas as pd

NWZ = np.zeros((10,10), dtype=np.float64) + 1

index = np.repeat([0,3], [1, 3], axis=0)

index

array([0, 3, 3, 3])

NWZ[index,:] += 1

NWZ

array([[ 2.,  2.,  2.,  2.,  2.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.],
       [ 1.,  1.,  1.,  1.,  1.]])

我们可以看到情况并非如此，因为给定同一行的多个实例只会导致原始行添加一个实例。因为+=执行“就地”操作，所以我假设此操作将返回

^{pr2}$

然而，通过显式地使用.__iadd__(1)，我们可以看到加法在索引中迭代时并不是累积执行的。在

NWZ[index,:].__iadd__(1)

array([[ 2.,  2.,  2.,  2.,  2.],
       [ 2.,  2.,  2.,  2.,  2.],
       [ 2.,  2.,  2.,  2.,  2.],
       [ 2.,  2.,  2.,  2.,  2.]])

你可以去here得到一个直观的解释，解释为什么没有（用户断言不应该）发生。在

我的问题的另一种解决方案是首先创建一个频率表，记录行n在我的重复索引中出现的次数。然后，因为我只做加法运算，所以把这些频率加到它们对应的行中。在

from scipy.stats import itemfreq

index_counts = itemfreq(index)

N = len(index_counts[:,1])
NWZ[index_counts[:,0].astype(int),:] += index_counts[:,1].reshape(N,1)
NWZ

array([[ 2.,  2.,  2.,  2.,  2.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 4.,  4.,  4.,  4.,  4.],
       [ 1.,  1.,  1.,  1.,  1.]])

相关问题更多 >

编程相关推荐

热门问题

热门文章