Python中基于多元线性回归的数据聚类

2024-04-20 08:27:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我想将数据分开,并将它们放入13个不同的变量集中,比如每个红色圆圈(见下图)。但是我不知道如何基于多元线性回归对数据进行聚类。你知道我怎么用Python做这个吗?在

I don't read what I write

数据集: https://www.dropbox.com/s/ar5rzry0joe9ffu/dataset_v1.xlsx?dl=

我现在用于群集的代码:

print(__doc__)

import openpyxl
import numpy as np
from sklearn.cluster import DBSCAN
import matplotlib.pyplot as plt


wb = openpyxl.load_workbook('dataset_v1.xlsx')
sheet = wb.worksheets[0]
ws = wb.active
row_count = sheet.max_row
data = np.zeros((row_count, 2))

index = 0
for r in ws.rows:
    data[index,0] = r[0].value
    data[index,1] = r[1].value
    index += 1


# Compute DBSCAN
db = DBSCAN(eps=5, min_samples=0.1).fit(data)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_

# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
n_noise_ = list(labels).count(-1)

clusters = [data[labels == i] for i in range(n_clusters_)]
outliers = data[labels == -1]

# #############################################################################
# Plot result

# Black removed and is used for noise instead.
unique_labels = set(labels)
colors = [plt.cm.Spectral(each)
          for each in np.linspace(0, 1, len(unique_labels))]
for k, col in zip(unique_labels, colors):
    if k == -1:
        # Black used for noise.
        col = [0, 0, 0, 0.5]

    class_member_mask = (labels == k)

    xy = data[class_member_mask & core_samples_mask]
    plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
             markeredgecolor='k', markersize=14)

    xy = data[class_member_mask & ~core_samples_mask]
    plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
             markeredgecolor='k', markersize=6)

plt.title('Estimated number of clusters: %d' % n_clusters_)
plt.show()

Tags: incoreimportfordbdataindexlabels