在python中使用kmeans sklearn集群数据点

data = np.array([[30, 17, 10, 32, 32], [18, 20, 6, 20, 15], [10, 8, 10, 20, 21], [3, 16, 20, 10, 17], [3, 15, 21, 17, 20]]) kmeans_clustering = KMeans( n_clusters = 3 ) idx = kmeans_clustering.fit_predict( data ) #use t-sne X = TSNE(n_components=2).fit_transform( data ) fig = plt.figure(1) plt.clf() #plot graph colors = np.array([x for x in 'bgrcmykbgrcmykbgrcmykbgrcmyk']) plt.scatter(X[:,0], X[:,1], c=colors[kmeans_clustering.labels_]) plt.title('K-Means (t-SNE)') plt.show()

2条回答

网友

1楼 · 编辑于 2024-05-18 23:44:14

您也可以使用PCA（主成分分析）而不是t-SNE来绘制您的聚类图：

import numpy as np
import pandas as pd  
from sklearn.cluster import Kmeans
from sklearn.decomposition import PCA

data =  np.array([[30, 17, 10, 32, 32], [18, 20, 6, 20, 15], [10, 8, 10, 20, 
21], [3, 16, 20, 10, 17], [3, 15, 21, 17, 20]])
kmeans = KMeans(n_clusters = 3)
labels = kmeans.fit_predict(data)    

pca = PCA(n_components=2)
data_reduced = pca.fit_transform(data)
data_reduced = pd.DataFrame(data_reduced)

ax = data_reduced.plot(kind='scatter', x=0, y=1, c=labels, cmap='rainbow')
ax.set_xlabel('PC1')
ax.set_ylabel('PC2')
ax.set_title('Projection of the clustering on a the axis of the PCA')

for x, y, label in zip(data_reduced[0], data_reduced[1], kmeans.labels_):
    ax.annotate('Cluster {0}'.format(label), (x,y))

网友

2楼 · 编辑于 2024-05-18 23:44:14

使用TSNE的perplexity参数。perplexity的默认值是30，这对于您的情况来说似乎太多了，尽管文档中指出{}对这个参数非常不敏感。在

The perplexity is related to the number of nearest neighbors that is used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. The choice is not extremely critical since t-SNE is quite insensitive to this parameter.

X = TSNE(n_components=2, perplexity=2.0).fit_transform( data )

相关问题更多 >

编程相关推荐

热门问题

热门文章