在数据帧上应用OneHotEncoder时,获取错误“传递值的形状为(8708,27),索引暗示(8708,4)”

2024-06-16 09:30:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在下面采样的数据帧上练习OneHotEncoder:

datetime    season  holiday workingday  weather         temp    atemp   humidity    windspeed   Total_booking   Hour    weekday    Month    date

5/2/2012 19:00  Summer  0   1       Clear + Few clouds  22.14   25.76   77          16.9979          504         19     Wednesday   May     5/2/2012

9/5/2012 4:00   Fall    0   1       Clear + Few clouds  28.7    33.335  79          19.0012         5           4       Wednesday  September9/5/2012

代码:

“df”是上面采样的数据帧categoryVariableList'是数据帧(df)中需要用于OneHotEncoder的列的列表

categoryVariableList = ["weekday","Month","season","weather"]

ohe = OneHotEncoder(categories='auto')
feature_arr = ohe.fit_transform(df[categoryVariableList]).toarray()
feature_labels = ohe.categories_

feature_labels = np.array(feature_labels).ravel()

features = pd.DataFrame(feature_arr, columns=feature_labels)
features

我得到的输出如下所示:

ValueError: Wrong number of items passed 27, placement implies 4
.....
Shape of passed values is (8708, 27), indices imply (8708, 4)

这里出了什么问题?请告知


Tags: 数据dflabelsfeatureseasonweatherclearclouds
2条回答

您的feature_arr有27列,但feature_labels只有4列,因此创建pandas.DataFrame失败

您可以使用.get_feature_names()

categoryVariableList = ["weekday","Month","season","weather"]

ohe = OneHotEncoder(categories='auto')
feature_arr = ohe.fit_transform(df[categoryVariableList]).toarray()
feature_labels = ohe.get_feature_names(categoryVariableList)

# feature_labels = np.array(feature_labels).ravel()

features = pd.DataFrame(feature_arr, columns=feature_labels)
features

也许你可以用标签编码器代替

df_x[category_cols_x]=df_x[category_cols_x]。应用(lambda col:le.fit_transform(col)) df_x[分类的_cols_x]

相关问题 更多 >