将Python For-Loop转换为NumPy操作

2024-04-26 03:12:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个充满索引的NumPy数组:

size = 100000
idx = np.random.randint(0, size, size=size)

我有一个简单的函数,在索引上循环,做:

out = np.zeros(size, dtype=np.int)

for i in range(size):
    j = idx[i]
    out[min(i, j)] = out[min(i, j)] + 1
    out[max(i, j)] = out[max(i, j)] - 1

return np.cumsum(out)

size很大的时候,这是相当慢的,我希望找到一个更快的方法来实现这一点。我试过这个,但不太对:

out = np.zeros(size, dtype=np.int)
i = np.arange(size)
j = idx[i]
mini = np.minimum(i, j)
maxi = np.maximum(i, j)

out[mini] = out[mini] + 1
out[maxi] = out[maxi] - 1

return np.cumsum(out)

Tags: numpysizereturnnpzeros数组outmin
2条回答

这似乎给出了与for循环相同的答案

i = np.arange(size)
j = idx[i]
mini = np.minimum(i, j)
maxi = np.maximum(i, j)

unique_mini, mini_counts = np.unique(mini, return_counts=True)
unique_maxi, maxi_counts = np.unique(maxi, return_counts=True)

out = np.zeros(size, dtype=np.int)
out[unique_mini] = out[unique_mini] + mini_counts
out[unique_maxi] = out[unique_maxi] - maxi_counts

return np.cumsum(out)

我们可以利用^{}-

R = np.arange(size)
out = np.bincount(np.minimum(R,idx),minlength=size)
out -= np.bincount(np.maximum(R,idx),minlength=size)
final_out = out.cumsum()

计时-

所有发布的解决方案最后都使用cumsum。所以,让我们计时跳过最后一步-

In [25]: np.random.seed(0)
    ...: size = 100000
    ...: idx = np.random.randint(0, size, size=size)

# From this post
In [27]: %%timeit
    ...: R = np.arange(size)
    ...: out = np.bincount(np.minimum(R,idx),minlength=size)
    ...: out -= np.bincount(np.maximum(R,idx),minlength=size)
1000 loops, best of 3: 643 µs per loop

# @slaw's solution
In [28]: %%timeit
    ...: i = np.arange(size)
    ...: j = idx[i]
    ...: mini = np.minimum(i, j)
    ...: maxi = np.maximum(i, j)
    ...: 
    ...: unique_mini, mini_counts = np.unique(mini, return_counts=True)
    ...: unique_maxi, maxi_counts = np.unique(maxi, return_counts=True)
    ...: 
    ...: out = np.zeros(size, dtype=np.int)
    ...: out[unique_mini] = out[unique_mini] + mini_counts
    ...: out[unique_maxi] = out[unique_maxi] - maxi_counts
100 loops, best of 3: 13.3 ms per loop

# Loopy one from question
In [29]: %%timeit
    ...: out = np.zeros(size, dtype=np.int)
    ...: 
    ...: for i in range(size):
    ...:     j = idx[i]
    ...:     out[min(i, j)] = out[min(i, j)] + 1
    ...:     out[max(i, j)] = out[max(i, j)] - 1
10 loops, best of 3: 141 ms per loop

相关问题 更多 >

    热门问题