数组的元素是否在一个集合中？

In [1]: import numpy as np In [2]: nr, nc = 100, 100 In [3]: top = 3000 In [4]: data = np.random.randint(0, top, (nr, nc)) In [5]: test = set(np.random.randint(0, top, top//3)) In [6]: %timeit np.in1d(data, np.hstack(test)) 100 loops, best of 3: 5.65 ms per loop In [7]: %timeit np.in1d(data, np.array(list(test))) 1000 loops, best of 3: 1.4 ms per loop In [8]: %timeit np.in1d(data, np.fromiter(test, int)) 1000 loops, best of 3: 1.33 ms per loop

In [10]: nr, nc = 1000, 1000 In [11]: top = 300000 In [12]: data = np.random.randint(0, top, (nr, nc)) In [13]: test = set(np.random.randint(0, top, top//3)) In [14]: %timeit np.in1d(data, np.hstack(test)) 1 loop, best of 3: 706 ms per loop In [15]: %timeit np.in1d(data, np.array(list(test))) 1 loop, best of 3: 269 ms per loop In [16]: %timeit np.in1d(data, np.fromiter(test, int)) 1 loop, best of 3: 274 ms per loop

2条回答

网友

1楼 · 编辑于 2024-04-25 00:23:13

我假设您正在寻找一个布尔数组来检测set元素在data数组中的存在。为此，可以使用^{}从set提取元素，然后使用^{}来检测set中每个位置的set中是否存在任何元素，给我们一个与data大小相同的布尔数组。因为，np.in1d在处理之前会使输入变平，因此作为最后一步，我们需要将输出从np.in1d改回其原始的2D形状。因此，最终实施将是-

np.in1d(data,np.hstack(test)).reshape(data.shape)

样本运行-

^{pr2}$

网友

2楼 · 编辑于 2024-04-25 00:23:13

表达式a = data < 6返回一个新数组，因为<是一个值比较运算符。在

Arithmetic, matrix multiplication, and comparison operations
Arithmetic and comparison operations on ndarrays are defined as element-wise operations, and generally yield ndarray objects as results.
Each of the arithmetic operations (+, -, *, /, //, %, divmod(), ** or pow(), <<, >>, &, ^, |, ~) and the comparisons (==, <, >, <=, >=, !=) is equivalent to the corresponding universal function (or ufunc for short) in Numpy.

请注意，in运算符不在此列表中。可能是因为它的工作方向与大多数操作符相反。在

当a + b与a.__add__(b)相同时，a in b从右到左b.__contains__(a)。在本例中，python尝试调用set.__contains__()，它只接受散列/不可变类型。数组是可变的，所以它们不能是集合的成员。在

解决方法是直接使用numpy.vectorize而不是in，并对数组中的每个元素调用任何python函数。在

它是numpy数组的一种map()。在

numpy.vectorize
Define a vectorized function which takes a nested sequence of objects or numpy arrays as inputs and returns a numpy array as output. The vectorized function evaluates pyfunc over successive tuples of the input arrays like the python map function, except it uses the broadcasting rules of numpy.

>>> import numpy
>>> data = numpy.random.randint(0, 10, (3, 3))
>>> test = set(numpy.random.randint(0, 10, 5))
>>> numpy.vectorize(test.__contains__)(data)

array([[False, False,  True],
       [ True,  True, False],
       [ True, False,  True]], dtype=bool)

基准

当n较大时，这种方法很快，因为set.__contains__()是一个恒定时间操作。（“大”表示top>；13000左右）

^{pr2}$

然而，当n很小时，其他解决方案要快得多。在

附录-benmarching不同解决方案

Arithmetic, matrix multiplication, and comparison operations

numpy.vectorize

基准

相关问题更多 >

编程相关推荐

热门问题

热门文章