Numpy：在一个数组中为每个元素找到另一个数组中的索引

70 投票

11 回答

69099 浏览

提问于 2025-04-17 06:58

我有两个一维数组，分别叫做 x 和 y，其中一个比另一个小。我想找到 y 中每个元素在 x 中的位置。

我发现了两种简单的方法来实现这个目标，第一种方法比较慢，第二种方法则占用内存比较多。

慢的方法

indices= []
for iy in y:
    indices += np.where(x==iy)[0][0]

占内存的方法

xe = np.outer([1,]*len(x), y)
ye = np.outer(x, [1,]*len(y))
junk, indices = np.where(np.equal(xe, ye))

有没有更快或者占用内存更少的方法呢？理想情况下，搜索可以利用我们不是在一个列表中查找一个东西，而是在查找多个东西，因此可以稍微利用一下并行处理的优势。如果你能假设 y 中的每个元素不一定都在 x 中，那就更好了。

性能比较内存优化并行处理数组索引一维数组 numpy优化数据查找

11 个回答

这样怎么样？

这个方法假设y中的每个元素都在x里面（即使有些元素不在x中，它也会返回结果！），但它的速度要快很多。

import numpy as np

# Generate some example data...
x = np.arange(1000)
np.random.shuffle(x)
y = np.arange(100)

# Actually preform the operation...
xsorted = np.argsort(x)
ypos = np.searchsorted(x[xsorted], y)
indices = xsorted[ypos]

回答于 2025-04-17 由 Python大师

分享举报

我想推荐一个一行代码的解决方案：

indices = np.where(np.in1d(x, y))[0]

这个结果是一个数组，里面的索引对应于x数组中找到的y数组的元素。

如果需要的话，可以不使用numpy.where来实现这个功能。

回答于 2025-04-17 由 Python大师

分享举报

正如Joe Kington所说，searchsorted()这个函数可以非常快速地查找元素。如果你要处理那些不在x中的元素，可以用查找的结果和原始的y进行对比，然后创建一个被遮罩的数组：

import numpy as np
x = np.array([3,5,7,1,9,8,6,6])
y = np.array([2,1,5,10,100,6])

index = np.argsort(x)
sorted_x = x[index]
sorted_index = np.searchsorted(sorted_x, y)

yindex = np.take(index, sorted_index, mode="clip")
mask = x[yindex] != y

result = np.ma.array(yindex, mask=mask)
print result

结果是：

[-- 3 1 -- -- 6]

回答于 2025-04-17 由 Python大师

分享举报

Numpy：在一个数组中为每个元素找到另一个数组中的索引

慢的方法

占内存的方法

11 个回答

撰写回答