在多个轴上对numpy数组进行花式索引的最佳实践

4 投票

3 回答

2269 浏览

提问于 2025-04-30 17:43

我正在尝试优化一个算法，以减少内存使用，我发现这个特定的操作是个麻烦。

我有一个对称矩阵，还有一个沿着行的索引数组和另一个沿着列的索引数组（这个列索引数组就是行索引中没有选择的所有值）。我觉得我应该可以同时传入这两个索引，但我发现自己被迫先选择一个方向的索引，然后再选择另一个方向，这导致了一些内存问题，因为我其实并不需要返回的数组副本，只需要从中计算的一些统计数据。以下是我想要做的事情：

from scipy.spatial.distance import pdist, squareform
from sklearn import datasets
import numpy as np

iris = datasets.load_iris().data

dx = pdist(iris)
mat = squareform(dx)

outliers = [41,62,106,108,109,134,135]
inliers = np.setdiff1d( range(iris.shape[0]), outliers)

# What I want to be able to do:
scores = mat[inliers, outliers].min(axis=0)

这是我实际做的事情，以使其正常工作：

# What I'm being forced to do:
s1 = mat[:,outliers]
scores = s1[inliers,:].min(axis=0)

因为我在使用复杂索引，所以s1是一个新数组，而不是一个视图。我只需要这个数组进行一次操作，所以如果我能消除在这里返回副本，或者至少让新数组更小（也就是说，在进行第一次复杂索引时，同时考虑第二个复杂索引，而不是分开进行两次复杂索引操作），那就更好了。

暂无标签

3 个回答

试试这个：

outliers = np.array(outliers)  # just to be sure they are arrays
result = mat[inliers[:, np.newaxis], outliers[np.newaxis, :]].min(0)

回答于 2025-04-30 由 Python大师

分享举报

“广播”这个概念可以用在索引上。你可以把 inliers 转换成一个列矩阵（比如用 inliers.reshape(-1,1) 或者 inliers[:, np.newaxis]），这样它的形状就变成了 (m,1)。然后你可以用这个列矩阵去索引 mat 的第一列：

s1 = mat[inliers.reshape(-1,1), outliers]
scores = s1.min(axis=0)

回答于 2025-04-30 由 Python大师

分享举报

在可读性方面，有更好的方法：

result = mat[np.ix_(inliers, outliers)].min(0)

https://docs.scipy.org/doc/numpy/reference/generated/numpy.ix_.html#numpy.ix_

回答于 2025-04-30 由 Python大师

分享举报

在多个轴上对numpy数组进行花式索引的最佳实践

3 个回答

撰写回答