Scipy 聚类二进制数据和标签

-1 投票

1 回答

1455 浏览

提问于 2025-04-18 03:34

我正在尝试对一个二进制数据集进行k均值聚类。下面的矩阵是基于网页访问情况的（'1'表示访问，'0'表示未访问）。第一列是用来识别每个用户的标签。

0,1,1,0,1,0,1,0,1,1,0
1,1,0,0,1,1,0,1,0,1,0
2,1,0,0,0,1,0,1,0,1,1
3,1,0,1,0,1,0,0,0,1,0
4,0,1,1,1,0,1,0,1,0,0
5,1,1,0,0,1,0,1,1,1,1
6,0,0,1,0,1,1,0,1,0,0
7,1,1,0,1,0,1,0,0,1,0
8,1,0,0,0,1,0,1,1,1,1
9,0,1,1,0,1,0,1,0,0,0

我使用的是scipy的k均值聚类，并且参考了这个教程。最后，我想知道每个用户属于哪个聚类。例如，如果k = 3。

0 - cluster_1
1 - cluster_0
2 - cluster_1
3 - cluster_3
.. - ....

以下是我尝试过的内容，但看起来二进制数据没有被正确聚类。有没有办法改进，以得到我期望的结果？

import numpy as np
from pylab import plot,show
from numpy import vstack,array
from numpy.random import rand
from scipy.cluster.vq import kmeans,vq

# data generation
data = np.array([[1,0,0,1,1,0,1,0,1,0],
[1,0,0,0,1,0,1,0,1,1],
[1,0,1,0,1,0,0,0,1,0],
[0,1,1,1,0,1,0,1,0,0],
[1,1,0,0,1,0,1,1,1,1],
[0,0,1,0,1,1,0,1,0,0],
[1,1,0,1,0,1,0,0,1,0],
[1,0,0,0,1,0,1,1,1,1],
[0,1,1,0,1,0,1,0,0,0],
[1,1,0,1,0,1,0,1,1,0]])

centroids,_ = kmeans(data,2)
idx,_ = vq(data,centroids)
plot(data[idx==0,0],data[idx==0,1],'ob',
     data[idx==1,0],data[idx==1,1],'or')
plot(centroids[:,0],centroids[:,1],'sg',markersize=8)
show()

scipy machine learning data analysis k-means binary data clustering user identification web traffic

1 个回答

请多看看文档，不要只是在网上复制粘贴代码。

idx,_ = vq(data,centroids)

你有没有想过idx是什么？

回答于 2025-04-18 由 Python大师

分享举报

Scipy 聚类二进制数据和标签

1 个回答

撰写回答