使用Scikit learn仅标准化连续变量

2024-04-18 11:22:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样的数据集

Total_Unique_items_bought   Total_Unique_Departments_visited    Total_qty_bought    Dept_Popular    Dept_Produce    Dept_D_n_E  Dept_Beverages  Dept_Pantry Dept_Frozen Dept_Deli   Dept_Bakery Dept_M_n_S  Dept_Household  Dept_Canned_Goods   Is_shopping_Monday  Is_shopping_Tuesday Is_shopping_Wednesday   Is_shopping_Thursday    Is_shopping_Friday  Is_shopping_Saturday    HrofDay_0_6 HrofDay_6_12    HrofDay_12_18   HrofDay_18_00   Is_store_id_3   Is_store_id_1   Is_store_id_5   Is_store_id_31  Is_store_id_29  Is_store_id_105 Is_store_id_115 Is_fulfillment_model_1  intercept
113818  26  10  36.0    1   1   0   0   1   0   0   0   0   0   0   0   0   0   0   0   1   0   1   0   0   0   0   0   0   0   1   0   1   1
62172   22  11  32.0    1   1   1   1   1   1   0   1   0   0   0   1   0   0   0   0   0   0   0   1   0   1   0   0   0   0   0   0   0   1
116483  25  10  27.0    1   1   1   1   0   1   0   1   1   0   1   0   0   0   0   0   0   0   0   1   0   1   0   0   0   0   0   0   1   1
1240    14  5   20.0    1   0   0   1   0   0   1   1   1   0   0   0   1   0   0   0   0   0   0   1   0   1   0   0   0   0   0   0   0   1
22087   20  7   33.0    1   1   1   1

只有前三个是连续值。其余的是分类的(二进制的)。在

我将rest的数据类型改为“category”,如下所示

^{pr2}$

现在我想标准化这三个变量,这样我就可以使用相同的transformer API,用相同的标准化器来标准化测试数据。为此我使用了Sklearn标准化器

# Standardizing the features of the dataset
def standardize(train_X,test_X):
    """ This function takes the dataset and standardizes each feature

    """

    # Standardizing the data with zero mean and Unit standard deviation of each feature (columns)
    from sklearn import preprocessing

    # Getting the standardizing scaler to be used for any new data too
    scaler = preprocessing.StandardScaler().fit(train_X)
    train_X_std=scaler.transform(train_X)

    ## Using the same transformation fitted on training data to transform the test data. 
    test_X_std=scaler.transform(test_X)

    return train_X_std,test_X_std,scaler

但这也在规范我的分类变量。如何预防。在

编辑

使用标准定标器变量

# Standardizing the features of the dataset
def standardize(train_X,test_X):
    """ This function takes the dataset and standardizes each feature

    """

    # Standardizing the data with zero mean and Unit standard deviation of each feature (columns)
    from sklearn import preprocessing

    # Getting the standardizing scaler to be used for any new data too
    cols=["Total_Unique_items_bought","Total_Unique_Departments_visited","Total_Qty_bought"]
    scaler = preprocessing.StandardScaler().fit(train_X[cols])
    train_X_std=scaler.transform(train_X[cols])

    ## Using the same transformation fitted on training data to transform the test data. 
    test_X_std=scaler.transform(test_X[cols])
    return train_X_std,test_X_std,scaler

Tags: thestoretestiddataistransformtrain