将多维聚类绘制为2D图形 python

3 投票
2 回答
10665 浏览
提问于 2025-04-30 13:25

我在处理很多数据的聚类,这些数据分成了两种不同的类别。

第一种是一个6维的类别,而第二种是一个12维的类别。现在我决定使用kmeans算法,因为它看起来是最简单易懂的聚类方法,适合我刚开始使用。

我想知道怎么把这些类别在一个二维图上展示出来,这样我就能判断kmeans算法是否有效。我想用matplotlib这个库,但其他的Python库也可以。

类别1是由这些数据类型组成的(整数、浮点数、浮点数、整数、浮点数、整数)。

类别2是由12个浮点数类型组成的。

我想得到一个类似这样的输出 enter image description here 任何建议都很有帮助。

暂无标签

2 个回答

0
   plot_cluster(X[:], kmean.cluster_centers_, kmean.labels_, clusters)

在这里输入图片描述

1

我在网上搜索了很多奇怪的没有评论的解决方案,最后终于搞明白怎么做了。如果你也想做类似的事情,这里有一段代码。代码来自不同的来源,还有很多是我自己写或修改的。我希望这段代码比其他地方的更容易理解。

这个函数是基于scipy里的kmeans2,它会返回一个质心列表和标签列表。kmeansdata是传给kmeans2进行聚类的numpy数组,而num_clusters表示传给kmeans2的聚类数量。

这段代码会生成一个新的png文件,确保不会覆盖其他文件。而且它只会绘制50个聚类(如果你有成千上万个聚类,就别试着输出全部了)。

(这段代码是为python2.7写的,我想其他版本也应该可以用。)

import numpy
import colorsys
import random
import os
from matplotlib.mlab import PCA as mlabPCA
from matplotlib import pyplot as plt


def get_colors(num_colors):
    """
    Function to generate a list of randomly generated colors
    The function first generates 256 different colors and then
    we randomly select the number of colors required from it
    num_colors        -> Number of colors to generate
    colors            -> Consists of 256 different colors
    random_colors     -> Randomly returns required(num_color) colors
    """
    colors = []
    random_colors = []
    # Generate 256 different colors and choose num_clors randomly
    for i in numpy.arange(0., 360., 360. / 256.):
        hue = i / 360.
        lightness = (50 + numpy.random.rand() * 10) / 100.
        saturation = (90 + numpy.random.rand() * 10) / 100.
        colors.append(colorsys.hls_to_rgb(hue, lightness, saturation))

    for i in range(0, num_colors):
        random_colors.append(colors[random.randint(0, len(colors) - 1)])
    return random_colors


def random_centroid_selector(total_clusters , clusters_plotted):
    """
    Function to generate a list of randomly selected
    centroids to plot on the output png
    total_clusters        -> Total number of clusters
    clusters_plotted      -> Number of clusters to plot
    random_list           -> Contains the index of clusters
                             to be plotted
    """
    random_list = []
    for i in range(0 , clusters_plotted):
        random_list.append(random.randint(0, total_clusters - 1))
    return random_list

def plot_cluster(kmeansdata, centroid_list, label_list , num_cluster):
    """
    Function to convert the n-dimensional cluster to 
    2-dimensional cluster and plotting 50 random clusters
    file%d.png    -> file where the output is stored indexed
                     by first available file index
                     e.g. file1.png , file2.png ...
    """
    mlab_pca = mlabPCA(kmeansdata)
    cutoff = mlab_pca.fracs[1]
    users_2d = mlab_pca.project(kmeansdata, minfrac=cutoff)
    centroids_2d = mlab_pca.project(centroid_list, minfrac=cutoff)


    colors = get_colors(num_cluster)
    plt.figure()
    plt.xlim([users_2d[:, 0].min() - 3, users_2d[:, 0].max() + 3])
    plt.ylim([users_2d[:, 1].min() - 3, users_2d[:, 1].max() + 3])

    # Plotting 50 clusters only for now
    random_list = random_centroid_selector(num_cluster , 50)

    # Plotting only the centroids which were randomly_selected
    # Centroids are represented as a large 'o' marker
    for i, position in enumerate(centroids_2d):
        if i in random_list:
            plt.scatter(centroids_2d[i, 0], centroids_2d[i, 1], marker='o', c=colors[i], s=100)


    # Plotting only the points whose centers were plotted
    # Points are represented as a small '+' marker
    for i, position in enumerate(label_list):
        if position in random_list:
            plt.scatter(users_2d[i, 0], users_2d[i, 1] , marker='+' , c=colors[position])

    filename = "name"
    i = 0
    while True:
        if os.path.isfile(filename + str(i) + ".png") == False:
            #new index found write file and return
            plt.savefig(filename + str(i) + ".png")
            break
        else:
            #Changing index to next number
            i = i + 1
    return

撰写回答