擅长:python、mysql、java
<p>为了回答最初的问题(对于像我这样发现这个问题的人来说是为了寻找复制的意大利面),这里有一个基于@hpaulj的建议的使用多处理的解决方案,即转换成<code>lil_matrix</code>,并在行上迭代</p>
<pre><code>from multiprocessing import Pool
def _top_k(args):
"""
Helper function to process a single row of top_k
"""
data, row = args
data, row = zip(*sorted(zip(data, row), reverse=True)[:k])
return data, row
def top_k(m, k):
"""
Keep only the top k elements of each row in a csr_matrix
"""
ml = m.tolil()
with Pool() as p:
ms = p.map(_top_k, zip(ml.data, ml.rows))
ml.data, ml.rows = zip(*ms)
return ml.tocsr()
</code></pre>