我试图使我的数据平衡，因为我的目标变量有多类，我想过度采样，使我的数据平衡

Restaurant Cuisines Average_Cost Rating Votes Reviews Area 0 3.526361 0.693147 5.303305 1.504077 2.564949 1.609438 7.214504 1 1.386294 4.127134 4.615121 1.504077 2.484907 1.609438 5.905362 2 2.772589 1.386294 5.017280 1.526056 4.605170 3.433987 6.131226 3 3.912023 2.833213 5.525453 1.547563 5.176150 4.564348 7.643483 4 3.526361 2.708050 5.303305 1.435085 5.948035 5.046646 6.126869 ... ... ... ... ... ... ... ... 11089 3.912023 0.693147 5.525453 1.648659 5.789960 5.046646 3.135494 11090 1.386294 6.028279 4.615121 1.526056 3.610918 2.833213 7.643483 11091 1.386294 2.397895 4.615121 1.504077 3.828641 2.944439 5.814131 11092 1.386294 6.028279 4.615121 1.410987 3.218876 2.302585 5.905362 11093 1.386294 6.028279 4.615121 1.029619 0.000000 0.000000 5.564520 11094 rows × 7 columns

30 minutes 7406 45 minutes 2665 65 minutes 923 120 minutes 62 20 minutes 20 80 minutes 14 10 minutes 4 Name: Delivery_Time, dtype: int64

FOR FOR MAKING THINGS BALANCE I TRIED SMOTEtomek to oversamplemy data and make it balance. Below are the codes provide and got error. from imblearn.combine import SMOTEtomek smk = SMOTEtomek(ratio = 1) x_res, y_res = smk.fit_sample(x,y) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-54-426e8b86623d> in <module>() 1 from imblearn.combine import SMOTETomek 2 smk = SMOTETomek(ratio = 1) ----> 3 x_res, y_res = smk.fit_sample(x,y) 2 frames /usr/local/lib/python3.6/dist-packages/imblearn/utils/_validation.py in _sampling_strategy_float(sampling_strategy, y, sampling_type) 311 if type_y != 'binary': 312 raise ValueError( --> 313 '"sampling_strategy" can be a float only when the type ' 314 'of target is binary. For multi-class, use a dict.') 315 target_stats = _count_class_sample(y) ValueError: "sampling_strategy" can be a float only when the type of target is binary. For multi-class, use a dict.

2条回答

网友

1楼 · 编辑于 2024-06-06 14:46:47

我认为您应该保持目标变量的相同比例，因为SMOTE可能会在测试数据集上给您增强和更好的结果，但是模型可能会在用户输入的新数据（实时数据）上失败。在

是使用SMOTE还是不是。你可以使用此代码：

from imblearn.oversampling import SMOTE
smote=SMOTE("minority")
X,Y=smote.fit_sample(x_train_data,y_train_data)

网友

2楼 · 编辑于 2024-06-06 14:46:47

您可以看到Smote的实际实现： https://github.com/scikit-learn-contrib/imbalanced-learn/blob/master/imblearn/utils/_validation.py#L355

你只要按错误中提到的那样把字典传过去就行了。但是SMOTE算法内部负责多类设置。在

执行：

from imblearn.oversampling import SMOTE
smote=SMOTE("minority")
X,Y=smote.fit_sample(x_train,y_train)

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章