如何使用nltk(python)获得K均值簇的单个质心

2024-04-19 16:20:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用nltk执行k均值聚类,因为我想将距离度量更改为余弦距离。但是,如何获得所有簇的质心?你知道吗

kclusterer = KMeansClusterer(8, distance = nltk.cluster.util.cosine_distance, repeats = 1)
predict = kclusterer.cluster(features, assign_clusters = True)
centroids = kclusterer._centroid
df_clustering['cluster'] = predict
#df_clustering['centroid'] = centroids[df_clustering['cluster'] - 1].tolist()
df_clustering['centroid'] = centroids

我正在尝试对一个pandas数据帧执行k均值聚类,并且希望每个数据点的聚类重心的坐标在数据帧列“centroid”中。你知道吗

提前谢谢!你知道吗


Tags: 数据距离df度量聚类predictdistance均值
1条回答
网友
1楼 · 发布于 2024-04-19 16:20:25
import pandas as pd
import numpy as np

# created dummy dataframe with 3 feature
df = pd.DataFrame([[1,2,3],[50, 51,52],[2.0,6.0,8.5],[50.11,53.78,52]], columns = ['feature1', 'feature2','feature3'])
print(df)

enter image description here

obj = KMeansClusterer(2, distance = nltk.cluster.util.cosine_distance) #giving number of cluster 2
vectors = [np.array(f) for f in df.values]

df['predicted_cluster'] = obj.cluster(vectors,assign_clusters = True))

enter image description here

print(obj.means())
#OP
[array([50.055, 52.39 , 52.   ]), array([1.5 , 4.  , 5.75])] #which is going to be mean of three feature for 2 cluster, since number of cluster that we passed is 2

 #now if u want the cluster center in pandas dataframe 
 df['centroid'] = df['predicted_cluster'].apply(lambda x: obj.means()[x])

enter image description here

相关问题 更多 >