使用numpy在矩阵上计算距离

1 投票

2 回答

3126 浏览

提问于 2025-04-17 10:03

我正在尝试用Python实现K均值算法（我知道有现成的库可以用，但我想自己学会怎么实现它）。这里是我遇到问题的函数：

def AssignPoints(points, centroids):
    """
    Takes two arguments:
    points is a numpy array such that points.shape = m , n where m is number of examples,
    and n is number of dimensions.

    centroids is numpy array such that centroids.shape = k , n where k is number of centroids.
    k < m should hold.

    Returns:
    numpy array A such that A.shape = (m,) and A[i] is index of the centroid which points[i] is assigned to.
    """

    m ,n = points.shape
    temp = []
    for i in xrange(n):
        temp.append(np.subtract.outer(points[:,i],centroids[:,i]))
    distances = np.hypot(*temp)
    return distances.argmin(axis=1)

这个函数的目的是，给定m个点在n维空间中，以及k个质心在n维空间中，生成一个numpy数组（x1 x2 x3 x4 ... xm），其中x1是离第一个点最近的质心的索引。这个功能之前运行得很好，但当我尝试用4维的例子时，就出现了这个错误：

  File "/path/to/the/kmeans.py", line 28, in AssignPoints
    distances = np.hypot(*temp)
ValueError: invalid number of arguments

我该怎么解决这个问题，或者如果解决不了，你有什么建议可以帮助我计算我想要的结果吗？

我的回答

def AssignPoints(points, centroids):
    m ,n = points.shape
    temp = []
    for i in xrange(n):
        temp.append(np.subtract.outer(points[:,i],centroids[:,i]))
    for i in xrange(len(temp)):
        temp[i] = temp[i] ** 2
    distances = np.add.reduce(temp) ** 0.5
    return distances.argmin(axis=1)

numpy 算法实现质心矩阵运算距离计算 k-means 多维空间

2 个回答

你需要定义一个距离函数，这里用到了 hypot。通常在 K-means 算法中，距离的计算方式是：距离 = 点与中心点之间的差的平方和。这里有一些 Matlab 代码可以实现这个功能……如果你不会的话，我可以帮你移植到其他语言，但你也可以试试看。就像你说的，只有通过实践才能学会。

function idx = findClosestCentroids(X, centroids)
%FINDCLOSESTCENTROIDS computes the centroid memberships for every example
%   idx = FINDCLOSESTCENTROIDS (X, centroids) returns the closest centroids
%   in idx for a dataset X where each row is a single example. idx = m x 1 
%   vector of centroid assignments (i.e. each entry in range [1..K])
%

% Set K
K = size(centroids, 1);

[numberOfExamples numberOfDimensions] = size(X);
% You need to return the following variables correctly.
idx = zeros(size(X,1), 1);


% Go over every example, find its closest centroid, and store
%               the index inside idx at the appropriate location.
%               Concretely, idx(i) should contain the index of the centroid
%               closest to example i. Hence, it should be a value in the 
%               range 1..K
%
for loop=1:numberOfExamples
    Distance = sum(bsxfun(@minus,X(loop,:),centroids).^2,2);
    [value index] = min(Distance);
    idx(loop) = index;
end;


end

更新

这个函数应该能返回距离，注意上面的 Matlab 代码只是返回了最近的中心点的距离（和索引）……而你的函数会返回所有的距离，下面的代码也是如此。

def FindDistance(X,centroids):
K=shape(centroids)[0]
examples, dimensions = shape(X)
distance = zeros((examples,K))
for ex in xrange(examples):
    distance[ex,:] = np.sum((X[ex,:]-centroids)**2,1)
return distance

回答于 2025-04-17 由 Python大师

分享举报

试试这个：

np.sqrt(((points[np.newaxis] - centroids[:,np.newaxis]) ** 2).sum(axis=2)).argmin(axis=0)

或者：

diff = points[np.newaxis] - centroids[:,np.newaxis]
norm = np.sqrt((diff*diff).sum(axis=2))
closest = norm.argmin(axis=0)

别问这在干嘛 :D

编辑：开玩笑的。中间的广播操作（points[np.newaxis] - centroids[:,np.newaxis]）是把原来的数组变成两个三维数组。结果是每个“平面”里都包含了所有点和某个中心点之间的差值。我们可以把这个叫做 diffs。

接着，我们用常规的方法来计算欧几里得距离（就是差值的平方和再开平方）：np.sqrt((diffs ** 2).sum(axis=2))。最后我们得到一个 (k, m) 的矩阵，其中第0行包含了到 centroids[0] 的距离，依此类推。所以，.argmin(axis=0) 就能给你想要的结果。

回答于 2025-04-17 由 Python大师

分享举报

使用numpy在矩阵上计算距离

我的回答

2 个回答

撰写回答