CSR稀疏矩阵相乘时的内存错误

2024-04-26 14:51:03 发布

男 | 程序猿一只，喜欢编程写python代码。

我使用Sklearn和TfidfVectorizer创建了一个文档术语矩阵。你知道吗

tfidf = TfidfVectorizer(use_idf = True, 
                        norm = normalization,
                        min_df = min_doc_freq) 

dtm = tfidf.fit_transform(text)

这给出了一个128111 x 3469稀疏矩阵，CSR格式，包含1865094个存储元素。我想把它乘以它的转置，但每次我都会得到一个内存错误。你知道吗

矩阵是128111 x 3469，这意味着得到的矩阵应该是128111 x 128111，看起来并没有那么大。你知道吗

我使用的是Python3.7.2，64位。在撰写本文时，我使用的VM有84 gig的RAM可供使用（它总共有125个以上）。你知道吗

我尝试了以下代码，每次都出现相同的错误：

sim = dtm * dtm.T #(also used dtm.transpose()) 

sim = dtm @ dtm.T 

sim = dtm.dot(dtm.T)

我希望返回一个稀疏矩阵，但得到“MemoryError”。你知道吗

 ~/utilities/anaconda3/lib/python3.7/site-packages/scipy/sparse/compressed.py in _mul_sparse_matrix(self, other) 
 500 maxval=nnz) 
 501 indptr = np.asarray(indptr, dtype=idx_dtype) 
 --> 502 indices = np.empty(nnz, dtype=idx_dtype) 
 503 data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype)) 
 504 MemoryError:

Tags： self 错误 np 矩阵 sim min tfidf sparse

0条回答

目前没有回答

CSR稀疏矩阵相乘时的内存错误

相关问题更多 >

编程相关推荐

热门问题

热门文章

CSR稀疏矩阵相乘时的内存错误

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >