Numpy: 根据索引数组查找行的值

0 投票

3 回答

848 浏览

提问于 2025-04-17 18:22

我有一个二维数组，里面存了一些值，还有一个一维数组，里面是索引。我想用这个索引数组从每一行中提取相应的值。下面的代码可以成功做到这一点：

from pprint import pprint
import numpy as np
_2Darray = np.arange(100, dtype = np.float16)
_2Darray = _2Darray.reshape((10, 10))
array_indexes = [5,5,5,4,4,4,6,6,6,8]
index_values = []
for row, index in enumerate(array_indexes):
    index_values.append(_2Darray[row, index])
pprint(_2Darray)
print index_values

返回的结果是

array([[  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.],
       [ 10.,  11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.],
       [ 20.,  21.,  22.,  23.,  24.,  25.,  26.,  27.,  28.,  29.],
       [ 30.,  31.,  32.,  33.,  34.,  35.,  36.,  37.,  38.,  39.],
       [ 40.,  41.,  42.,  43.,  44.,  45.,  46.,  47.,  48.,  49.],
       [ 50.,  51.,  52.,  53.,  54.,  55.,  56.,  57.,  58.,  59.],
       [ 60.,  61.,  62.,  63.,  64.,  65.,  66.,  67.,  68.,  69.],
       [ 70.,  71.,  72.,  73.,  74.,  75.,  76.,  77.,  78.,  79.],
       [ 80.,  81.,  82.,  83.,  84.,  85.,  86.,  87.,  88.,  89.],
       [ 90.,  91.,  92.,  93.,  94.,  95.,  96.,  97.,  98.,  99.]], dtype=float16)
[5.0, 15.0, 25.0, 34.0, 44.0, 54.0, 66.0, 76.0, 86.0, 98.0]

但是我想只用numpy的函数来实现这个功能。我试了很多numpy的函数，但没有一个能简单地完成这个任务。

提前谢谢大家！

编辑我终于想出了我的实现方法：

index_values = _2Darray[ind[0], ind[1]] for ind in
                    enumerate(array_indexes)),
                    dtype = _2Darray.dtype,
                    count = len(_2Darray))

多亏了root，我的实现和他的都搞定了。现在来做一些性能测试：我的实现通过cProfiler运行

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    2    0.274    0.137    0.622    0.311 {numpy.core.multiarray.fromiter}
20274    0.259    0.000    0.259    0.000 lazer_np.py:86(<genexpr>)

而root的实现：

    4    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
    1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.arange}

我简直不敢相信，cProfiler竟然没有检测到root的方法需要任何时间。我觉得这可能是某种bug，但他的确明显更快。在之前的测试中，我发现root的方法大约快了3倍

注意：这些测试是在一个形状为(20273, 200)的np.float16值数组上进行的。此外，每个索引测试都需要运行两次。

numpy 数据提取二维数组性能测试数组索引计算优化一维数组 cprofiler

3 个回答

你需要注意使用专门为数组设计的numpy函数，而不是为矩阵设计的函数。这两者很容易搞混，而且当你把一个的函数用在另一个上时不会报错，但输出的结果几乎是不可预测的。

回答于 2025-04-17 由 Python大师

分享举报

In [15]: _2Darray[np.arange(len(_2Darray)), [5,5,5,4,4,4,6,6,6,8]]
Out[15]: array([  5.,  15.,  25.,  34.,  44.,  54.,  66.,  76.,  86.,  98.],
         dtype=float16)

In [22]: def f(array, indices):
    ...:     return [array[row, index] for row, index in enumerate(indices)]

In [23]: f(_2Darray, [5,5,5,4,4,4,6,6,6,8])
Out[23]: [5.0, 15.0, 25.0, 34.0, 44.0, 54.0, 66.0, 76.0, 86.0, 98.0]

In [27]: %timeit f(_2Darray,[5,5,5,4,4,4,6,6,6,8])
100000 loops, best of 3: 7.48 us per loop

In [28]: %timeit _2Darray[np.arange(len(_2Darray)), [5,5,5,4,4,4,6,6,6,8]]
10000 loops, best of 3: 24.2 us per loop

不过，我觉得根据你的解决方案，可能在处理小数组时会更快。如果数组的大小超过了 100*100，那就使用 numpy 的索引功能吧。

回答于 2025-04-17 由 Python大师

分享举报

这样做就可以了：

row = numpy.arange(_2Darray.shape[0])
index_values = _2Darray[row, array_indexes]

Numpy 让你可以用两个数组来索引二维数组（其实是多维数组），这样做：

for i in range(len(row)):
    result1[i] = array[row[i], col[i]]

result2 = array[row, col]
numpy.all(result1 == result2)

回答于 2025-04-17 由 Python大师

分享举报

Numpy: 根据索引数组查找行的值

3 个回答

撰写回答