我在尝试用sklearn、pandas和numpy进行多维缩放。Im使用的数据文件有10个数字列,没有丢失的值。我试着用sklearn把这个10维数据可视化成2维,流形的多维标度如下:
import numpy as np
import pandas as pd
from sklearn import manifold
from sklearn.metrics import euclidean_distances
seed = np.random.RandomState(seed=3)
data = pd.read_csv('data/big-file.csv')
# start small dont take all the data,
# its about 200k records
subset = data[:10000]
similarities = euclidean_distances(subset)
mds = manifold.MDS(n_components=2, max_iter=3000, eps=1e-9,
random_state=seed, dissimilarity="precomputed", n_jobs=1)
pos = mds.fit(similarities).embedding_
但我得到这个值错误:
Traceback (most recent call last):
File "demo/mds-demo.py", line 18, in <module>
pos = mds.fit(similarities).embedding_
File "/Users/dwilliams/Desktop/Anaconda/lib/python2.7/site-packages/sklearn/manifold/mds.py", line 360, in fit
self.fit_transform(X, init=init)
File "/Users/dwilliams/Desktop/Anaconda/lib/python2.7/site-packages/sklearn/manifold/mds.py", line 395, in fit_transform
eps=self.eps, random_state=self.random_state)
File "/Users/dwilliams/Desktop/Anaconda/lib/python2.7/site-packages/sklearn/manifold/mds.py", line 242, in smacof
eps=eps, random_state=random_state)
File "/Users/dwilliams/Desktop/Anaconda/lib/python2.7/site-packages/sklearn/manifold/mds.py", line 73, in _smacof_single
raise ValueError("similarities must be symmetric")
ValueError: similarities must be symmetric
我认为欧氏距离返回了一个对称矩阵。我做错了什么?我该怎么解决?
刚才也有同样的问题。另一个我认为更有效的解决方案是只计算上三角矩阵的距离,然后复制到下半部分。
可以使用scipy执行以下操作:
我遇到了同样的问题;结果发现,我的数据是一个
np.float32
数组,浮点精度降低导致距离矩阵不对称。在运行MDS之前,我将数据转换为np.float64
来解决这个问题。下面是一个使用随机数据来说明问题的示例:
相关问题 更多 >
编程相关推荐