在Python中计算PCA的欧几里得距离
我有一个三维的 numpy array
,它是用来做主成分分析(PCA)的,内容如下:
pcar =[[xa ya za]
[xb yb zb]
[xc yc zc]
.
.
[xn yn zn]]
在这个数组中,每一行代表一个点。我从上面的 PCA
结果中随机选了两行作为一个聚类,内容如下:
out_list=pcar[numpy.random.randint(0,pcar.shape[0],2)]
这样就得到了一个包含两行的 numpy array
。
接下来,我需要计算出 out_list
中每一行与 pcar
中每一行(点)之间的欧几里得距离,然后把 pcar
中最近的点添加到 out_list
的聚类中。
2 个回答
2
编辑
好的,我下载并安装了numpy,还自学了一下。这里是一个numpy版本的代码。
旧答案
我知道你想要一个numpy的答案。我的numpy有点生疏,不过因为没有其他答案,我想给你一个Matlab的版本。转换成numpy应该很简单。我假设你关心的是概念,而不是代码。
请注意,有很多方法可以解决这个问题,我只是提供其中一种。
可用的Numpy版本
import numpy as np
pcar = np.random.rand(10,3)
out_list=pcar[np.random.randint(0,pcar.shape[0],2)]
ol_1 = out_list[0,:]
ol_2 = out_list[1,:]
## Get the individual distances
## The trick here is to pre-multiply the 1x3 ol vector with a row of
## ones of size 10x1 to get a 10x3 array with ol replicated, so that it
## can simply be subtracted
d1 = pcar - ones( size(pcar,1))*ol_1
d2 = pcar - ones( size(pcar,1))*ol_2
##% Square them using an element-wise square
d1s = np.square(d1)
d2s = np.square(d2)
##% Sum across the rows, not down columns
d1ss = np.sum(d1s, axis=1)
d2ss = np.sum(d2s, axis=1)
##% Square root using an element-wise square-root
e1 = np.sqrt(d1ss)
e2 = np.sqrt(d2ss)
##% Assign to class one or class two
##% Start by assigning one to everything, then select all those where ol_2
##% is closer and assign them the number 2
assign = ones(size(e1,0));
assign[e2<e1] = 2
##% Separate
pcar1 = pcar[ assign==1, :]
pcar2 = pcar[ assign==2, :]
可用的Matlab版本
close all
clear all
% Create 10 records each with 3 attributes
pcar = rand(10, 3)
% Pick two (normally at random of course)
out_list = pcar(1:2, :)
% Hard-coding this separately, though this can be done iteratively
ol_1 = out_list(1,:)
ol_2 = out_list(2,:)
% Get the individual distances
% The trick here is to pre-multiply the 1x3 ol vector with a row of
% ones of size 10x1 to get a 10x3 array with ol replicated, so that it
% can simply be subtracted
d1 = pcar - ones( size(pcar,1), 1)*ol_1
d2 = pcar - ones( size(pcar,1), 1)*ol_2
% Square them using an element-wise square
d1s = d1.^2
d2s = d2.^2
% Sum across the rows, not down columns
d1ss = sum(d1s, 2)
d2ss = sum(d2s, 2)
% Square root using an element-wise square-root
e1 = sqrt(d1ss)
e2 = sqrt(d2ss)
% Assign to class one or class two
% Start by assigning one to everything, then select all those where ol_2
% is closer and assign them the number 2
assign = ones(length(e1),1);
assign(e2<e1)=2
% Separate
pcar1 = pcar( assign==1, :)
pcar2 = pcar( assign==2, :)
% Plot
plot3(pcar1(:,1), pcar1(:,2), pcar1(:,3), 'g+')
hold on
plot3(pcar2(:,1), pcar2(:,2), pcar2(:,3), 'r+')
plot3(ol_1(1), ol_1(2), ol_1(3), 'go')
plot3(ol_2(1), ol_2(2), ol_2(3), 'ro')
2
在Scipy中,有一个非常快速的实现:
from scipy.spatial.distance import cdist, pdist
cdist这个函数可以接收两个向量,就像你的pchar那样,然后计算这两个点之间的距离。
而pdist则只会给你这个距离矩阵的上三角部分。
因为它们的背后是用C或Fortran语言实现的,所以运行起来非常高效。