我可以使用Kmeans从数据集中预测可能的疾病吗？

import pandas as pd #importing library for reading dataset from sklearn.cluster import KMeans #using ML library in python for utilizing kmeans ##reading the dataset from csv file and storing in variable called data.. data = pd.read_csv(r"C:\Users\Hassan Tariq\Disease Prediction\DataSet.csv") ##selecting data cols from dataset. X_Data = data.iloc[:,[1]] #first col as a part of first variable Y_Data = data.iloc[:,[2,3]] ##second col as a part of second variable ##i have used two cols in second variable because we cannot train kmeans on three parameters. #initializing the model with 3 initial clusters. model1 = KMeans(n_clusters=3, random_state=3) #training model on the selected data.. prediction = model1.fit_predict(X_Data,Y_Data) #printing the clusters prediction from the model. print("Clustered Dataset: \n",prediction) #printing the centroids which shows the data behavior in each cluster print("Centroids of the clusters formed: \n",model1.cluster_centers_) centeroids_collection = model1.cluster_centers_ #specifying the diseases which can be possible. disease1 = ['Muscle Twitching','Nausea'] disease2 = ['Eye Irritation', 'Lung Irritation'] disease3 = ['Eye Irritation','Diarrhea'] #loop for iterating all the data in the dataset to predict the disease..

1条回答

网友

1楼 · 发布于 2024-04-16 05:36:54

不要试图硬编码集群数量值，首先尝试使用弯头方法获取集群数量。一旦你得到了集群的数量，试着去适应模型，这样你的预测就会更准确。获取集群的示例代码如下-

X_std = StandardScaler().fit_transform(data)

运行kmeans的本地实现在这里我们测试了3个集群

km = Kmeans(n_clusters=3, max_iter=100, random_state = 42) km.fit（X_std）质心=km.centroids`

标签u相当于调用fit（x），然后调用predict

labels_ = km.predict(X_std)labels_

运行kmeans的本地实现在这里我们测试了3个集群

标签u相当于调用fit（x），然后调用predict

相关问题更多 >

编程相关推荐

热门问题

热门文章