我试图学习scikit,但在代码,这是关于编码器要求他们的输入是统一的字符串或数字结巴

2024-03-29 01:58:47 发布

您现在位置:Python中文网/ 问答频道 /正文

我一直在从youtube视频学习python。我是python新手,只是个初学者。我在视频中看到了这段代码,所以我尝试了一下,但得到了我不知道如何解决的错误。 下面是我遇到麻烦的代码。我没有写enitre代码,因为它太长了

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn import svm
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split
%matplotlib inline


wine = pd.read_csv('wine_quality.csv')
wine.head()
wine.info()
wine.isnull().sum()

#Preprocessing
bins=(2,6.5,8)
group_names=['bad','good']
wine['quality'] = pd.cut(wine['quality'], bins=bins, labels=group_names)
wine['quality'].unique()

label_quality=LabelEncoder()
wine['quality']=label_quality.fit_transform(wine['quality'])
#after this im getting that error

'''TypeError                                 Traceback (most recent call last)
~\anaconda3\lib\site-packages\sklearn\preprocessing\_label.py in _encode(values, uniques, encode, check_unknown)
    112         try:
--> 113             res = _encode_python(values, uniques, encode)
    114         except TypeError:

~\anaconda3\lib\site-packages\sklearn\preprocessing\_label.py in _encode_python(values, uniques, encode)
     60     if uniques is None:
---> 61         uniques = sorted(set(values))
     62         uniques = np.array(uniques, dtype=values.dtype)

TypeError: '<' not supported between instances of 'float' and 'str'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-14-8e211b2c4bf8> in <module>
----> 1 wine['quality'] = label_quality.fit_transform(wine['quality'])

~\anaconda3\lib\site-packages\sklearn\preprocessing\_label.py in fit_transform(self, y)
    254         """
    255         y = column_or_1d(y, warn=True)
--> 256         self.classes_, y = _encode(y, encode=True)
    257         return y
    258 

~\anaconda3\lib\site-packages\sklearn\preprocessing\_label.py in _encode(values, uniques, encode, check_unknown)
    115             types = sorted(t.__qualname__
    116                            for t in set(type(v) for v in values))
--> 117             raise TypeError("Encoders require their input to be uniformly "
    118                             f"strings or numbers. Got {types}")
    119         return res

TypeError: Encoders require their input to be uniformly strings or numbers. Got ['float', 'str']'''
```

请帮我纠正我的错误。如果你能确切地告诉我该怎么做,那就太好了


Tags: infromimportlibsitesklearnlabelencode
1条回答
网友
1楼 · 发布于 2024-03-29 01:58:47

因此,我检查了葡萄酒质量数据集,并在执行以下操作时:

wine['quality'].unique()

我得到了以下输出:

array([6, 5, 7, 8, 4, 3, 9], dtype=int64)

现在,由于我们的值超过了您在bins中为pd.cut()函数提供的上限,超出限制的值将替换为NaN值。我也在我的编译器上检查过,所以在执行预处理之后

#Preprocessing
bins=(2,6.5,8)
group_names=['bad','good']
wine['quality'] = pd.cut(wine['quality'], bins=bins, labels=group_names)
wine['quality'].unique()

我得到的wine['quality'].unique()结果是:

['bad', 'good', NaN]
Categories (2, object): ['bad' < 'good']

这是因为所有超过8的值(您提供的上限)都更改为NaN,这在pd.cut()函数的文档中也提到过:

Out of bounds values will be NA in the resulting Series or Categorical object. Therefore I would suggest that you should increase your upper bound in the bins to 9. I tried to do that and the function works fine without any issues.

#Preprocessing
bins=(2,6.5,9)
group_names=['bad','good']
wine['quality'] = pd.cut(wine['quality'], bins=bins, labels=group_names)
wine['quality'].unique()

现在wine['quality'].unique()的输出是:

['bad', 'good']
Categories (2, object): ['bad' < 'good']

因此,我们不再有NaN值,您的标签编码器现在应该可以正常工作了

相关问题 更多 >