如何使用sklearn柱变压器？

Input Data set Country Age Salary France 44 72000 Spain 27 48000 Germany 30 54000 Spain 38 61000 Germany 40 67000 France 35 58000 Spain 26 52000 France 48 79000 Germany 50 83000 France 37 67000 import pandas as pd import numpy as np from sklearn.preprocessing import LabelEncoder, OneHotEncoder #X is my dataset variable name label_encoder = LabelEncoder() x.iloc[:,0] = label_encoder.fit_transform(x.iloc[:,0]) #LabelEncoder is used to encode the country value hot_encoder = OneHotEncoder(categorical_features = [0]) x = hot_encoder.fit_transform(x).toarray()

0(fran) 1(ger) 2(spain) 3(age) 4(salary) 1 0 0 44 72000 0 0 1 27 48000 0 1 0 30 54000 0 0 1 38 61000 0 1 0 40 67000 1 0 0 35 58000 0 0 1 36 52000 1 0 0 48 79000 0 1 0 50 83000 1 0 0 37 67000

3条回答

网友

1楼 · 编辑于 2024-05-14 07:54:42

@Fawwaz Yusran来处理这个警告。。。

FutureWarning: The handling of integer data will change in version 0.22. Currently, the categories are determined based on the range [0, max(values)], while in the future they will be determined based on the unique values. If you want the future behaviour and silence this warning, you can specify "categories='auto'". In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly. warnings.warn(msg, FutureWarning)

删除以下内容。。。

labelencoder_X = LabelEncoder()
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])

因为您直接使用一个hotecoder，所以不需要LabelEncoder。

网友

2楼 · 编辑于 2024-05-14 07:54:42

很奇怪你想把连续的数据编码成工资。除非你把薪水限制在某个范围/类别内，否则这是没有意义的。如果我在你想做的地方：

import pandas as pd
import numpy as np

from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder



numeric_features = ['Salary']
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])

categorical_features = ['Age','Country']
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])

从这里开始，你可以用一个分类器，例如

clf = Pipeline(steps=[('preprocessor', preprocessor),
                  ('classifier', LogisticRegression(solver='lbfgs'))])

按原样使用：

clf.fit(X_train,y_train)

这将应用预处理器，然后将转换后的数据传递给预测器。

网友

3楼 · 编辑于 2024-05-14 07:54:42

我认为海报并不是要改变年龄和薪水。在文档（https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_transformer.html）中，column transformer（和make_column_transformer）只包含在transformer中指定的列（即示例中的[0]）。您应该设置remainer=“passthrough”以获取其余列。换句话说：

preprocessor = make_column_transformer( (OneHotEncoder(),[0]),remainder="passthrough")
x = preprocessor.fit_transform(x)

相关问题更多 >

编程相关推荐

热门问题

热门文章