在一个数组中切片另一个数组的numpy索引

3 投票

4 回答

1357 浏览

提问于 2025-04-17 23:28

实际的问题出现在某个机器学习的应用中，数据有点复杂。所以这里有一个简单的例子，能抓住问题的本质：

我有两个数组，分别是这样创建的：

L = np.arange(12).reshape(4,3)
M = np.arange(12).reshape(6,2)

现在，我想找到数组 L 中的行 R，使得在数组 M 中存在某一行，它由 R 中的所有元素组成，但不包括最后一个元素。

根据上面的示例代码，数组 L 和 M 看起来是这样的：

array([[ 0,  1,  2],  # L
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

array([[ 0,  1],  # M
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11]])

我想从中提取出在 L 中标记的行，作为一个 numpy 数组：

array([[ 0,  1,  2],
       [ 6,  7,  8]])

如果我用 Python 列表来表示 L 和 M，我会这样做：

L = [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]]
M = [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, 11]]
answer = [R for R in L if R[:-1] in M]

现在，我知道我可以在 numpy 中使用类似的列表推导式，并将结果转换成数组。numpy 非常强大，可能还有我不知道的更优雅的方法来实现这个。

我试着查看 np.where（用来获取所需的索引，然后我可以用这些索引来访问 L），但这似乎不能满足我的需求。

我会很感激任何帮助。

列表推导式数据处理 numpy 数据提取机器学习数组切片数组操作索引操作

4 个回答

>>> print np.array([row for row in L if row[:-1] in M])
[[0 1 2]
 [6 7 8]]

当然可以！请把你想要翻译的内容发给我，我会帮你用简单易懂的语言解释清楚。

回答于 2025-04-17 由 Python大师

分享举报

这段内容和Bitwise的回答很相似：

def fn(a):
    return lambda b: np.all(a==b, axis=1)
matches = np.apply_along_axis(fn(M), 1, L[:,:2])
result = L[np.any(matches, axis=1)]

在背后发生的事情大概是这样的（我会用Bitwise的例子，因为更容易理解）：

>>> M
array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11]])
>>> M.shape+=(1,)
>>> M
array([[[ 0],
        [ 1]],

       [[ 2],
        [ 3]],

       [[ 4],
        [ 5]],

       [[ 6],
        [ 7]],

       [[ 8],
        [ 9]],

       [[10],
        [11]]])

我们在M数组上增加了一个维度，现在它的形状变成了(6,2,1)。

>>> L2 = L[:,:-1].T

接着，我们去掉最后的2列，并对数组进行转置，这样维度就变成了(2,4)。

这就是神奇的地方，M和L2现在可以广播到(6,2,4)的数组形状。

根据numpy的文档：

一组数组如果可以“广播”到相同的形状，意味着上述规则能产生有效的结果，也就是说，以下任一条件成立：
The arrays all have exactly the same shape.
The arrays all have the same number of dimensions and the length of each dimensions is either a common length or 1.
The arrays that have too few dimensions can have their shapes prepended with a dimension of length 1 to satisfy property 2.
举个例子

如果a的形状是(5,1)，b的形状是(1,6)，c的形状是(6,)，d的形状是()（所以d是一个标量），那么a、b、c和d都可以广播到(5,6)的维度；并且
a acts like a (5,6) array where a[:,0] is broadcast to the other columns,
b acts like a (5,6) array where b[0,:] is broadcast to the other rows,
c acts like a (1,6) array and therefore like a (5,6) array where c[:] is broadcast to every row, and finally,
d acts like a (5,6) array where the single value is repeated.

M[:,:,0]会被重复4次来填充3维，而L2会增加一个新维度，并被重复6次来填充它。

>>> B = np.broadcast_arrays(L2,M)
>>> B
[array([[[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]],

       [[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]],

       [[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]],

       [[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]],

       [[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]],

       [[ 0,  3,  6,  9],
        [ 1,  4,  7, 10]]]),


array([[[ 0,  0,  0,  0],
        [ 1,  1,  1,  1]],

       [[ 2,  2,  2,  2],
        [ 3,  3,  3,  3]],

       [[ 4,  4,  4,  4],
        [ 5,  5,  5,  5]],

       [[ 6,  6,  6,  6],
        [ 7,  7,  7,  7]],

       [[ 8,  8,  8,  8],
        [ 9,  9,  9,  9]],

       [[10, 10, 10, 10],
        [11, 11, 11, 11]]])]

现在我们可以逐个元素进行比较：

>>> np.equal(*B)
array([[[ True, False, False, False],
        [ True, False, False, False]],

       [[False, False, False, False],
        [False, False, False, False]],

       [[False, False, False, False],
        [False, False, False, False]],

       [[False, False,  True, False],
        [False, False,  True, False]],

       [[False, False, False, False],
        [False, False, False, False]],

       [[False, False, False, False],
        [False, False, False, False]]], dtype=bool)

按行比较（轴 = 1）：

>>> np.all(np.equal(*B), axis=1)
array([[ True, False, False, False],
       [False, False, False, False],
       [False, False, False, False],
       [False, False,  True, False],
       [False, False, False, False],
       [False, False, False, False]], dtype=bool)

对L进行聚合：

>>> C = np.any(np.all(np.equal(*B), axis=1), axis=0)
>>> C
array([ True, False,  True, False], dtype=bool)

这会给你一个布尔掩码，可以应用到L上。

>>> L[C]
array([[0, 1, 2],
       [6, 7, 8]])

apply_along_axis会利用同样的特性，但会减少L的维度，而不是增加M的维度（因此增加隐式循环）。

回答于 2025-04-17 由 Python大师

分享举报

好的，我明白了。关键是给 M 增加一个维度，这样你就可以使用广播功能了：

M.shape += (1,)
E = np.all(L[:,:-1].T == M, 1)

这样你就得到了一个 6x4 的布尔矩阵 E，它可以显示所有 L 的行和所有 M 的行之间的比较结果。

接下来就很简单了：

result = L[np.any(E,0)]

这样一来，解决方案就变得更简洁了，你不需要任何 lambda 函数或者“隐式循环”（比如 np.apply_along_axis()）。

没错，numpy 的向量化功能真是太棒了（不过有时候你得想得比较抽象）...

回答于 2025-04-17 由 Python大师

分享举报

>>> import hashlib
>>> fn = lambda xs: hashlib.sha1(xs).hexdigest()
>>> m = np.apply_along_axis(fn, 1, M)
>>> l = np.apply_along_axis(fn, 1, L[:,:-1])
>>> L[np.in1d(l, m)]
array([[0, 1, 2],
       [6, 7, 8]])

当然可以！请把你想要翻译的内容发给我，我会帮你把它变得更简单易懂。

回答于 2025-04-17 由 Python大师

分享举报

在一个数组中切片另一个数组的numpy索引

4 个回答

撰写回答