在python中平衡数据

2024-06-11 09:49:56 发布

您现在位置：Python中文网/ 问答频道 /正文

522

网友

男 | 程序猿一只，喜欢编程写python代码。

对于我的深度学习任务，我想相应地平衡我的数据。数据集是“威斯康星州乳腺癌（诊断）数据集”，检索自https://www.kaggle.com/uciml/breast-cancer-wisconsin-data

目前，数据是不平衡的：357个级别被标记为良性，而212个级别为恶性。在

我想知道什么是平衡数据的最佳方法。SMOTE是解决这个问题的好选择吗？我需要什么代码来平衡诊断类列？如何指定“诊断”列需要平衡？请看下面我的尝试。在

data = pd.read_csv('C:/Users/Vincent/Desktop/DeepLearning/data.csv')

classifier = RandomForestClassifier

X_train, X_test, y_train, y_test = train_test_split(data['data'], data['target'], random_state=2)

smote_pipeline = make_pipeline_imb(SMOTE(random_state=4), classifier(random_state=42))
smote_model = smote_pipeline.fit(X_train,y_train)
smote_prediction = smote_model.predict(X_test)

Tags： csv 数据 test data model pipeline train random

0条回答

目前没有回答

在python中平衡数据

相关问题更多 >

编程相关推荐

热门问题

热门文章

在python中平衡数据

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >