我有这样的数据集
Total_Unique_items_bought Total_Unique_Departments_visited Total_qty_bought Dept_Popular Dept_Produce Dept_D_n_E Dept_Beverages Dept_Pantry Dept_Frozen Dept_Deli Dept_Bakery Dept_M_n_S Dept_Household Dept_Canned_Goods Is_shopping_Monday Is_shopping_Tuesday Is_shopping_Wednesday Is_shopping_Thursday Is_shopping_Friday Is_shopping_Saturday HrofDay_0_6 HrofDay_6_12 HrofDay_12_18 HrofDay_18_00 Is_store_id_3 Is_store_id_1 Is_store_id_5 Is_store_id_31 Is_store_id_29 Is_store_id_105 Is_store_id_115 Is_fulfillment_model_1 intercept
113818 26 10 36.0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 1 1
62172 22 11 32.0 1 1 1 1 1 1 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1
116483 25 10 27.0 1 1 1 1 0 1 0 1 1 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 1
1240 14 5 20.0 1 0 0 1 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1
22087 20 7 33.0 1 1 1 1
只有前三个是连续值。其余的是分类的(二进制的)。在
我将rest的数据类型改为“category”,如下所示
^{pr2}$现在我想标准化这三个变量,这样我就可以使用相同的transformer API,用相同的标准化器来标准化测试数据。为此我使用了Sklearn标准化器
# Standardizing the features of the dataset
def standardize(train_X,test_X):
""" This function takes the dataset and standardizes each feature
"""
# Standardizing the data with zero mean and Unit standard deviation of each feature (columns)
from sklearn import preprocessing
# Getting the standardizing scaler to be used for any new data too
scaler = preprocessing.StandardScaler().fit(train_X)
train_X_std=scaler.transform(train_X)
## Using the same transformation fitted on training data to transform the test data.
test_X_std=scaler.transform(test_X)
return train_X_std,test_X_std,scaler
但这也在规范我的分类变量。如何预防。在
编辑
使用标准定标器变量
# Standardizing the features of the dataset
def standardize(train_X,test_X):
""" This function takes the dataset and standardizes each feature
"""
# Standardizing the data with zero mean and Unit standard deviation of each feature (columns)
from sklearn import preprocessing
# Getting the standardizing scaler to be used for any new data too
cols=["Total_Unique_items_bought","Total_Unique_Departments_visited","Total_Qty_bought"]
scaler = preprocessing.StandardScaler().fit(train_X[cols])
train_X_std=scaler.transform(train_X[cols])
## Using the same transformation fitted on training data to transform the test data.
test_X_std=scaler.transform(test_X[cols])
return train_X_std,test_X_std,scaler
目前没有回答
相关问题 更多 >
编程相关推荐