print(x)'
这里的“x”是自变量
Restaurant Cuisines Average_Cost Rating Votes Reviews Area
0 3.526361 0.693147 5.303305 1.504077 2.564949 1.609438 7.214504
1 1.386294 4.127134 4.615121 1.504077 2.484907 1.609438 5.905362
2 2.772589 1.386294 5.017280 1.526056 4.605170 3.433987 6.131226
3 3.912023 2.833213 5.525453 1.547563 5.176150 4.564348 7.643483
4 3.526361 2.708050 5.303305 1.435085 5.948035 5.046646 6.126869
... ... ... ... ... ... ... ...
11089 3.912023 0.693147 5.525453 1.648659 5.789960 5.046646 3.135494
11090 1.386294 6.028279 4.615121 1.526056 3.610918 2.833213 7.643483
11091 1.386294 2.397895 4.615121 1.504077 3.828641 2.944439 5.814131
11092 1.386294 6.028279 4.615121 1.410987 3.218876 2.302585 5.905362
11093 1.386294 6.028279 4.615121 1.029619 0.000000 0.000000 5.564520
11094 rows × 7 columns
^{pr2}$
这里“y”是目标变量,它有多个类。在
30 minutes 7406
45 minutes 2665
65 minutes 923
120 minutes 62
20 minutes 20
80 minutes 14
10 minutes 4
Name: Delivery_Time, dtype: int64
在研究了目标变量后,我们可以看到“30分钟”类在其他类中具有更高的计数。在
FOR FOR MAKING THINGS BALANCE I TRIED SMOTEtomek to oversamplemy data and make it balance. Below are the codes provide and got error.
from imblearn.combine import SMOTEtomek
smk = SMOTEtomek(ratio = 1)
x_res, y_res = smk.fit_sample(x,y)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-54-426e8b86623d> in <module>()
1 from imblearn.combine import SMOTETomek
2 smk = SMOTETomek(ratio = 1)
----> 3 x_res, y_res = smk.fit_sample(x,y)
2 frames
/usr/local/lib/python3.6/dist-packages/imblearn/utils/_validation.py in _sampling_strategy_float(sampling_strategy, y, sampling_type)
311 if type_y != 'binary':
312 raise ValueError(
--> 313 '"sampling_strategy" can be a float only when the type '
314 'of target is binary. For multi-class, use a dict.')
315 target_stats = _count_class_sample(y)
ValueError: "sampling_strategy" can be a float only when the type of target is binary. For multi-class, use a dict.
我认为您应该保持目标变量的相同比例,因为SMOTE可能会在测试数据集上给您增强和更好的结果,但是模型可能会在用户输入的新数据(实时数据)上失败。在
是使用SMOTE还是不是。你可以使用此代码:
您可以看到
Smote
的实际实现: https://github.com/scikit-learn-contrib/imbalanced-learn/blob/master/imblearn/utils/_validation.py#L355你只要按错误中提到的那样把字典传过去就行了。但是SMOTE算法内部负责多类设置。在
执行:
^{pr2}$相关问题 更多 >
编程相关推荐