在numpy/scipy中高效计算N点与参考点的距离

28 投票

8 回答

60415 浏览

数据工程师

提问于 2025-04-16 20:01

我刚开始使用scipy和numpy。我有一个100000行3列的数组，每一行代表一个坐标，还有一个1行3列的中心点。我想计算数组中每一行到这个中心点的距离，并把这些距离存储在另一个数组里。有什么最有效的方法可以做到这一点呢？

numpy scipy 数组操作距离计算

8 个回答

你也可以使用规范的发展（类似于显著的身份）。这可能是计算一组点的距离最有效的方法。

下面是我最初在Octave中用于k近邻算法的代码片段，但你可以很容易地将其改编为numpy，因为它只使用矩阵乘法（在numpy中对应的函数是numpy.dot()）：

% Computing the euclidian distance between each known point (Xapp) and unknown points (Xtest)
% Note: we use the development of the norm just like a remarkable identity:
% ||x1 - x2||^2 = ||x1||^2 + ||x2||^2 - 2*<x1,x2>
[napp, d] = size(Xapp);
[ntest, d] = size(Xtest);

A = sum(Xapp.^2, 2);
A = repmat(A, 1, ntest);

B = sum(Xtest.^2, 2);
B = repmat(B', napp, 1);

C = Xapp*Xtest';

dist = A+B-2.*C;

回答于 2025-04-16 由 Python大师

分享举报

我会使用sklearn库中的欧几里得距离实现。这样做的好处是可以利用更高效的矩阵乘法表达式：

dist(x, y) = sqrt(np.dot(x, x) - 2 * np.dot(x, y) + np.dot(y, y)

一个简单的脚本看起来是这样的：

import numpy as np

x = np.random.rand(1000, 3)
y = np.random.rand(1000, 3)

dist = np.sqrt(np.dot(x, x)) - (np.dot(x, y) + np.dot(x, y)) + np.dot(y, y)

这种方法的优点在sklearn的文档中有很好的描述：

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.euclidean_distances.html#sklearn.metrics.pairwise.euclidean_distances

我正在使用这种方法来处理大型数据矩阵（10000, 10000），并做了一些小的修改，比如使用np.einsum函数。

回答于 2025-04-16 由 Python大师

分享举报

我建议你看看 scipy.spatial.distance.cdist 这个功能：

http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html

import numpy as np
import scipy

a = np.random.normal(size=(10,3))
b = np.random.normal(size=(1,3))

dist = scipy.spatial.distance.cdist(a,b) # pick the appropriate distance metric

默认的距离计算方式 dist 实际上是等同于：

np.sqrt(np.sum((a-b)**2,axis=1))

不过对于大数组来说，使用 cdist 会高效得多（在我的电脑上，对于你提到的这个问题，cdist 的速度快了大约35倍）。

回答于 2025-04-16 由 Python大师

分享举报

在numpy/scipy中高效计算N点与参考点的距离

8 个回答

撰写回答