我可以使用Kmeans从数据集中预测可能的疾病吗?

2024-04-16 05:36:54 发布

您现在位置:Python中文网/ 问答频道 /正文

下面是我编写的代码,用于从一个有3个参数的数据集中使用k-Means预测可能的疾病,对吗? 但这并没有给出我想要的准确结果

import pandas as pd #importing library for reading dataset
from sklearn.cluster import KMeans #using ML library in python for 
utilizing kmeans


##reading the dataset from csv file and storing in variable called data..
data = pd.read_csv(r"C:\Users\Hassan Tariq\Disease 
Prediction\DataSet.csv")

##selecting data cols from dataset.
X_Data = data.iloc[:,[1]] #first col as a part of first variable
Y_Data = data.iloc[:,[2,3]] ##second col as a part of second variable
##i have used two cols in second variable because we cannot train kmeans 
on three parameters.


#initializing the model with 3 initial clusters.
model1 = KMeans(n_clusters=3, random_state=3)

#training model on the selected data..
prediction = model1.fit_predict(X_Data,Y_Data)

#printing the clusters prediction from the model.
print("Clustered Dataset: \n",prediction)

#printing the centroids which shows the data behavior in each cluster
print("Centroids of the clusters formed: \n",model1.cluster_centers_)

centeroids_collection = model1.cluster_centers_

#specifying the diseases which can be possible.
disease1 = ['Muscle Twitching','Nausea']
disease2 = ['Eye Irritation', 'Lung Irritation']
disease3 = ['Eye Irritation','Diarrhea']

 #loop for iterating all the data in the dataset to predict the disease..

1条回答
网友
1楼 · 发布于 2024-04-16 05:36:54

不要试图硬编码集群数量值,首先尝试使用弯头方法获取集群数量。一旦你得到了集群的数量,试着去适应模型,这样你的预测就会更准确。获取集群的示例代码如下-

X_std = StandardScaler().fit_transform(data)

运行kmeans的本地实现在这里我们测试了3个集群

km = Kmeans(n_clusters=3, max_iter=100, random_state = 42) km.fit(X_std)质心=km.centroids`

标签u相当于调用fit(x),然后调用predict

labels_ = km.predict(X_std)labels_

相关问题 更多 >