LabelEncoder:“float”和“str”的实例之间不支持TypeError:“>”

le = preprocessing.LabelEncoder() categorical = list(df.select_dtypes(include=['object']).columns.values) for cat in categorical: print(cat) df[cat].fillna('UNK', inplace=True) df[cat] = le.fit_transform(df[cat]) # print(le.classes_) # print(le.transform(le.classes_)) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-24-424a0952f9d0> in <module>() 4 print(cat) 5 df[cat].fillna('UNK', inplace=True) ----> 6 df[cat] = le.fit_transform(df[cat].fillna('UNK')) 7 # print(le.classes_) 8 # print(le.transform(le.classes_)) C:\Users\paula.ceccon.ribeiro\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\preprocessing\label.py in fit_transform(self, y) 129 y = column_or_1d(y, warn=True) 130 _check_numpy_unicode_bug(y) --> 131 self.classes_, y = np.unique(y, return_inverse=True) 132 return y 133 C:\Users\paula.ceccon.ribeiro\AppData\Local\Continuum\Anaconda3\lib\site-packages\numpy\lib\arraysetops.py in unique(ar, return_index, return_inverse, return_counts) 209 210 if optional_indices: --> 211 perm = ar.argsort(kind='mergesort' if return_index else 'quicksort') 212 aux = ar[perm] 213 else: TypeError: '>' not supported between instances of 'float' and 'str'

3条回答

网友

1楼 · 编辑于 2024-05-15 04:38:23

这是由于序列df[cat]包含的元素具有不同的数据类型，例如（字符串和/或浮点）。这可能是由于数据的读取方式，即数字作为浮点读取，文本作为字符串读取，或者数据类型是浮点的，并在fillna操作后更改。

换句话说

pandas data type 'Object' indicates mixed types rather than str type

因此，使用以下行：

df[cat] = le.fit_transform(df[cat].astype(str))

应该会有帮助

网友

2楼 · 编辑于 2024-05-15 04:38:23

或者使用一个带有分裂到均匀类型str的cast

unique, counts = numpy.unique(str(a).split(), return_counts=True)

网友

3楼 · 编辑于 2024-05-15 04:38:23

由于字符串数据类型具有可变长度，因此它默认存储为对象类型。在处理了丢失的值之后，我也遇到了这个问题。在我的例子中，在标签编码工作之前，将所有这些列转换为类型“category”。

df[cat]=df[cat].astype('category')

然后检查df.dtypes并执行标签编码。

相关问题更多 >

编程相关推荐

热门问题

热门文章