在numpy数组中查找唯一点

9 投票

2 回答

7257 浏览

提问于 2025-04-17 05:32

在一个像这样的numpy数组中，如何更快地找到唯一的x,y点（去掉重复的点）呢：

points = numpy.random.randint(0, 5, (10,2))

我曾考虑把这些点转换成复数，然后再检查哪些是唯一的，但这样做感觉有点复杂：

b = numpy.unique(points[:,0] + 1j * points[:,1])
points = numpy.column_stack((b.real, b.imag))

numpy 数组操作唯一性检查数据去重

2 个回答

我觉得你这里有个很好的想法。想想用来表示points数据的那块内存。我们告诉numpy把那块内存看作一个形状为(10,2)的数组，数据类型是int32（32位整数），但其实告诉numpy把同样的内存块看作一个形状为(10,)的数组，数据类型是c8（64位复数），几乎是没有成本的。

所以，真正的成本只在于调用np.unique，然后再进行一次几乎没有成本的view和reshape调用：

import numpy as np
np.random.seed(1)
points = np.random.randint(0, 5, (10,2))
print(points)
print(len(points))

会得到

[[3 4]
 [0 1]
 [3 0]
 [0 1]
 [4 4]
 [1 2]
 [4 2]
 [4 3]
 [4 2]
 [4 2]]
10

而

cpoints = points.view('c8')
cpoints = np.unique(cpoints)
points = cpoints.view('i4').reshape((-1,2))
print(points)
print(len(points))

会得到

[[0 1]
 [1 2]
 [3 0]
 [3 4]
 [4 2]
 [4 3]
 [4 4]]
7

如果你不需要结果是排序的，wim的方法会更快（你可能想考虑接受他的答案...）

import numpy as np
np.random.seed(1)
N=10000
points = np.random.randint(0, 5, (N,2))

def using_unique():
    cpoints = points.view('c8')
    cpoints = np.unique(cpoints)
    return cpoints.view('i4').reshape((-1,2))

def using_set():
    return np.vstack([np.array(u) for u in set([tuple(p) for p in points])])

会得到这些基准测试结果：

% python -mtimeit -s'import test' 'test.using_set()'
100 loops, best of 3: 18.3 msec per loop
% python -mtimeit -s'import test' 'test.using_unique()'
10 loops, best of 3: 40.6 msec per loop

回答于 2025-04-17 由 Python大师

分享举报

我会这样做：

numpy.array(list(set(tuple(p) for p in points)))

如果你想要一个快速的解决方案，适用于大多数情况，也许这个方法会对你有帮助：http://code.activestate.com/recipes/52560-remove-duplicates-from-a-sequence/

回答于 2025-04-17 由 Python大师

分享举报

在numpy数组中查找唯一点

2 个回答

撰写回答