StandardScaler removes the mean and scales the data to unit variance.
However, the outliers have an influence when computing the empirical
mean and standard deviation which shrink the range of the feature
values as shown in the left figure below. Note in particular that
because the outliers on each feature have different magnitudes, the
spread of the transformed data on each feature is very different: most
of the data lie in the [-2, 4] range for the transformed median income
feature while the same data is squeezed in the smaller [-0.2, 0.2]
range for the transformed number of households.
StandardScaler therefore cannot guarantee balanced feature scales in
the presence of outliers.
MinMaxScaler rescales the data set such that all feature values are in
the range [0, 1] as shown in the right panel below. However, this
scaling compress all inliers in the narrow range [0, 0.005] for the
transformed number of households.
来自ScikitLearn site:
MinMaxScaler(feature_range = (0, 1))
将在[0,1]范围内按比例转换列中的每个值。将此作为转换特征的第一个缩放选项,因为它将保留数据集的形状(无失真)。StandardScaler()
将把列中的每个值转换为关于平均值0和标准偏差1的范围,即,通过减去平均值并除以标准偏差,将每个值正规化。如果知道数据分布正常,请使用标准缩放器。如果有异常值,请使用
RobustScaler()
。或者,您可以删除异常值并使用上述两个定标器中的任何一个(选择取决于数据是否正态分布)附加说明:如果在列车试验分离前使用定标器,则会发生数据泄漏。列车解体后必须使用定标器
相关问题 更多 >
编程相关推荐