用Python双向聚类数据,并保留列名称:如何保持列名称?

2024-04-23 14:52:55 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图在数据集上应用bi集群。我正在关注这个guide

import numpy as np
from matplotlib import pyplot as plt
import pandas as pd

from sklearn.datasets import make_biclusters
from sklearn.datasets import samples_generator as sg
from sklearn.cluster.bicluster import SpectralCoclustering

# make some fake data for this question
data, rows, columns = make_biclusters(
    shape=(20, 20), n_clusters=2, noise=5,
    shuffle=False, random_state=0)
data, row_idx, col_idx = sg._shuffle(data, random_state=0) # shuffle it

# my real data is in a pandas df WITH column names. These are of course just placeholder
df = pd.DataFrame(data)
colum_names = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t']
df.columns = colum_names

# Converting from pandas to np removes the columns labels
data = np.array(data)

# show the data, with column labels. 
# There was no re-ordering, the labels are still correct

plt.imshow(data)
plt.xticks(range(0,len(colum_names)),colum_names)
plt.yticks(range(0,len(colum_names)),colum_names)
plt.title("Original dataset")

enter image description here

现在,我应用bi集群模型。这会“洗牌”列/行,从而使轴标签不正确

model = SpectralCoclustering(2)
model.fit(data)

fit_data = data[np.argsort(model.row_labels_)]
fit_data = fit_data[:, np.argsort(model.column_labels_)]

plt.imshow(fit_data)
plt.title("After biclustering; rearranged to show biclusters")

plt.xticks(range(0,len(colum_names)),colum_names)
plt.yticks(range(0,len(colum_names)),colum_names)

plt.colorbar()

enter image description here

我的问题。如何应用标签列上应用的相同重新排序,以便重新排序的图形中的标签是正确的


Tags: fromimportpandasdatalabelsmodellennames