使用sklearn缩放pandas数据帧列

import pandas as pd import numpy as np from sklearn import preprocessing scaler = preprocessing.MinMaxScaler() dfTest = pd.DataFrame({'A':[14.00,90.20,90.95,96.27,91.21],'B':[103.02,107.26,110.35,114.23,114.68], 'C':['big','small','big','small','small']}) min_max_scaler = preprocessing.MinMaxScaler() def scaleColumns(df, cols_to_scale): for col in cols_to_scale: df[col] = pd.DataFrame(min_max_scaler.fit_transform(pd.DataFrame(dfTest[col])),columns=[col]) return df dfTest A B C 0 14.00 103.02 big 1 90.20 107.26 small 2 90.95 110.35 big 3 96.27 114.23 small 4 91.21 114.68 small scaled_df = scaleColumns(dfTest,['A','B']) scaled_df A B C 0 0.000000 0.000000 big 1 0.926219 0.363636 small 2 0.935335 0.628645 big 3 1.000000 0.961407 small 4 0.938495 1.000000 small

3条回答

网友

1楼 · 编辑于 2024-05-23 18:08:27

我不确定以前的pandas版本是否阻止了这一点，但现在下面的代码片段对我来说非常适合，并且可以生成您想要的内容，而无需使用apply

>>> import pandas as pd
>>> from sklearn.preprocessing import MinMaxScaler


>>> scaler = MinMaxScaler()

>>> dfTest = pd.DataFrame({'A':[14.00,90.20,90.95,96.27,91.21],
                           'B':[103.02,107.26,110.35,114.23,114.68],
                           'C':['big','small','big','small','small']})

>>> dfTest[['A', 'B']] = scaler.fit_transform(dfTest[['A', 'B']])

>>> dfTest
          A         B      C
0  0.000000  0.000000    big
1  0.926219  0.363636  small
2  0.935335  0.628645    big
3  1.000000  0.961407  small
4  0.938495  1.000000  small

网友

2楼 · 编辑于 2024-05-23 18:08:27

正如pir的注释中提到的那样，.apply(lambda el: scale.fit_transform(el))方法将产生以下警告：

DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.

将列转换为numpy数组应该可以完成这项工作（我更喜欢StandardScaler）：

from sklearn.preprocessing import StandardScaler
scale = StandardScaler()

dfTest[['A','B','C']] = scale.fit_transform(dfTest[['A','B','C']].as_matrix())

~~--编辑2018年11月（熊猫测试0.23.4）--~~

~~正如Rob Murray在评论中提到的，在当前（v0.23.4）版本的pandas中，.as_matrix()返回FutureWarning。因此，应该用.values代替：~~

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

scaler.fit_transform(dfTest[['A','B']].values)

~~--编辑2019年5月（熊猫测试0.24.2）--~~

~~正如joelostblom在评论中提到的，“由于0.24.0，建议使用.to_numpy()，而不是.values。”~~

~~更新示例：~~

import pandas as pd
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
dfTest = pd.DataFrame({
               'A':[14.00,90.20,90.95,96.27,91.21],
               'B':[103.02,107.26,110.35,114.23,114.68],
               'C':['big','small','big','small','small']
             })
dfTest[['A', 'B']] = scaler.fit_transform(dfTest[['A','B']].to_numpy())
dfTest
      A         B      C
0 -1.995290 -1.571117    big
1  0.436356 -0.603995  small
2  0.460289  0.100818    big
3  0.630058  0.985826  small
4  0.468586  1.088469  small

网友
3楼 · 编辑于 2024-05-23 18:08:27

~~像这样？~~

dfTest = pd.DataFrame({
           'A':[14.00,90.20,90.95,96.27,91.21],
           'B':[103.02,107.26,110.35,114.23,114.68], 
           'C':['big','small','big','small','small']
         })
dfTest[['A','B']] = dfTest[['A','B']].apply(
                           lambda x: MinMaxScaler().fit_transform(x))
dfTest

    A           B           C
0   0.000000    0.000000    big
1   0.926219    0.363636    small
2   0.935335    0.628645    big
3   1.000000    0.961407    small
4   0.938495    1.000000    small

相关问题更多 >

编程相关推荐

热门问题

热门文章