Python-Kmeans可视化(高维)

2024-04-20 05:34:13 发布

您现在位置:Python中文网/ 问答频道 /正文

enter image description here我必须用Python对超过15个维度的客户进行聚类。 你能告诉我,用T-SNE方法在Kmeans之后可视化集群是正确的吗?我收到了非常好的绘图,即使在我的数据帧异常值。这让我怀疑我是否做得对。我的同事在R上进行高维聚类,他们没有使用任何降维方法,比如PCA或t-SNE,他们注意到我可能不正确。 这是我第一次体验。事先谢谢你的帮助。你知道吗

我从示例中编写代码:https://www.kaggle.com/minc33/visualizing-high-dimensional-clusters

我的代码:

#import libraries
import numpy as np
import pandas as pd
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import pandas.io.sql as psql
import plotly.graph_objs as go
from sklearn.preprocessing import MinMaxScaler
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE 
from sklearn.cluster import KMeans
from scipy import stats
import matplotlib.pyplot as plt
#download my dataframe
data = pd.read_csv ('C:\\Users\\Desktop\python\ex.csv')
#use data without customer id
d=data.iloc[:,1:20]
#Kmeans
X =d.copy()
scaler = MinMaxScaler()
numer = pd.DataFrame(scaler.fit_transform(X))
kmeans = KMeans(n_clusters=3)
kmeans.fit(numer)
clusters = kmeans.predict(numer)
numer["Cluster"] = clusters
#visualisation
plotX = pd.DataFrame(np.array(numer.sample(10000)))
plotX.columns =numer.columns
perplexity = 50
tsne_2d = TSNE(n_components=2, perplexity=perplexity)
TCs_2d = pd.DataFrame(tsne_2d.fit_transform(plotX.drop(["Cluster"], axis=1)))
TCs_2d.columns = ["TC1_2d","TC2_2d"]
plotX = pd.concat([plotX,TCs_2d], axis=1, join='inner')
cluster0 = plotX[plotX["Cluster"] == 0]
cluster1 = plotX[plotX["Cluster"] == 1]
cluster2 = plotX[plotX["Cluster"] == 2]
trace1 = go.Scatter(
                    x = cluster0["TC1_2d"],
                    y = cluster0["TC2_2d"],
                    mode = "markers",
                    name = "Cluster 0",
                    marker = dict(color = 'rgba(255, 128, 255, 0.8)'),
                    text = None)
#trace2 is for 'Cluster 1'
trace2 = go.Scatter(
                    x = cluster1["TC1_2d"],
                    y = cluster1["TC2_2d"],
                    mode = "markers",
                    name = "Cluster 1",
                    marker = dict(color = 'rgba(255, 128, 2, 0.8)'),
                    text = None)
#trace3 is for 'Cluster 2'
trace3 = go.Scatter(
                    x = cluster2["TC1_2d"],
                    y = cluster2["TC2_2d"],
                    mode = "markers",
                    name = "Cluster 2",
                    marker = dict(color = 'rgba(0, 255, 200, 0.8)'),
                    text = None)
data = [trace1, trace2, trace3]
title = "Visualizing Clusters in Two Dimensions Using T-SNE (perplexity=" + str(perplexity) + ")"
layout = dict(title = title,
              xaxis= dict(title= 'TC1',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'TC2',ticklen= 5,zeroline= False)
             )
fig = dict(data = data, layout = layout)
plot(fig)

Tags: fromimportdatatitlemodeasdictpd