如何利用OneHotEncoder的阵列输出

array([[22. , 0. , 1. , ..., 1. , 0. , 7.25 ], [38. , 1. , 0. , ..., 1. , 0. , 71.2833 ], [26. , 1. , 0. , ..., 0. , 0. , 7.925 ], ..., [29.69911765, 1. , 0. , ..., 1. , 2. , 23.45 ], [26. , 0. , 1. , ..., 0. , 0. , 30. ], [32. , 0. , 1. , ..., 0. , 0. , 7.75 ]])

1条回答

网友

1楼 · 发布于 2024-06-02 07:55:12

您的直觉是正确的：pandas.get_dummies()使用起来容易得多，但使用OHE的优点是它将始终对看不见的数据应用相同的转换。您还可以使用pickle或joblib导出实例并将其加载到其他脚本中

有一种方法可以直接将编码列重新附加回原始pandas.DataFrame。就我个人而言，我一直在努力。也就是说，我安装编码器，转换数据，将输出连接回数据帧，并删除原始列

# Columns to encode
cols = ['Sex','Embarked']

# Initialize encoder
ohe = OneHotEncoder()

# Fit to data
ohe.fit(df[cols])

# Declare encoded data as new columns in `df`
df[ohe.get_feature_names] = ohe.transform(df[cols])

# Drop unencoded columns
df.drop(cols, axis=1, inplace=True)

最后，我注意到你说：

I feel pretty confident in using it in combination with fit_transform so that the results can also be fit to the test dataframe.

我想指出的是，您应该不要再安装编码器！相反，在处理新数据时应该使用ohe.transform(X_test[cols])。不要再次使用fit_transform()，否则结果可能因数据集而异

相关问题更多 >

编程相关推荐

热门问题

热门文章