如何规范化数据帧,使线图从同一点开始?

2024-03-28 08:47:57 发布

您现在位置:Python中文网/ 问答频道 /正文

从2015年到今天,我有一个如下的数据框架(名为net_asset)

    a   b   c   d   e   f   g   h   i   j   k   l   m   n   o   p   q   r
Date                                                                        
2015-04-30  162.20100   38.69620    98.88842    11.75094    8.92177 1.07767 112.81237   110.08090   NaN 4.20428 221.5440    NaN 1.63142 155.30297   8.19891 13.94684    7.40493 27.85345
2015-05-29  164.04053   39.19910    101.54701   11.97325    8.94295 1.12211 114.48715   113.24696   NaN 4.30719 215.7512    NaN 1.65257 154.85456   8.33938 14.29280    7.47724 27.32846
2015-06-30  163.17050   39.00262    101.77694   11.93908    8.96241 1.13880 114.23190   112.75483   10.0000 4.22515 207.5485    NaN 1.67049 158.25418   8.57353 14.13962    7.61546 26.99618
2015-07-31  160.73069   38.49814    102.63752   11.95354    8.93894 1.14438 111.00177   110.01403   10.1106 4.19375 205.0794    NaN 1.65833 161.83255   8.67075 14.25327    7.67866 27.31167

为了更容易在绘图后比较数据,我希望所有列都从同一点开始,这里是100。(2015年应该是100)

我试过下面的代码,但没能得到我想象的,2015年是100

net_asset.apply(lambda x: (x - x.min()) / (x.max() - x.min()))

上面的代码返回。净资产.总目()

Date                                                                        
2015-04-30  29.481157   20.728226   12.566996   14.006493   24.887183   85.363231   11.168351   20.119944   NaN 26.292755   38.674209   NaN 19.586481   9.290352    5.570366    9.204228    4.566915    100.000000
2015-05-29  31.475018   22.683843   15.138121   16.334712   25.302741   95.113764   12.794772   25.172351   NaN 31.434296   34.177011   NaN 21.440216   9.022051    7.029734    11.419483   5.223939    95.558550
2015-06-30  30.531995   21.919795   15.360487   15.976855   25.684553   98.775698   12.546892   24.387008   26.207877   27.335452   27.808905   NaN 23.010851   11.056174   9.462360    10.438639   6.479836    92.747440
2015-07-31  27.887493   19.958033   16.192755   16.128292   25.224064   100.000000  9.410033    20.013232   27.427053   25.766660   25.892037   NaN 21.945063   13.197250   10.472396   11.166364   7.054085    95.416506

资产净值.tail()

2020-11-30  67.200005   72.608636   76.959357   85.856731   88.155809   57.219650   94.367147   84.263184   84.411962   49.771676   78.669830   91.698367   91.659509   95.793550   97.312319   100.000000  98.638703   12.572080
2020-12-31  79.321960   80.759312   87.806721   94.821595   96.394572   69.535073   99.215011   97.320232   87.610922   62.294533   89.893726   100.000000  100.000000  100.000000  100.000000  99.515149   100.000000  20.818697
2021-01-29  82.292270   80.581521   87.481611   92.795622   97.256100   70.575071   99.335197   93.571979   89.231346   58.588387   91.402937   92.293295   96.259225   96.302455   93.245683   95.127478   94.362002   20.405762
2021-02-26  91.587476   90.773715   91.445362   94.800335   98.102520   81.569651   95.674504   91.847156   97.434880   70.743028   97.713593   85.960528   89.612951   93.915749   88.721404   87.146839   88.763620   21.716141
2021-03-31  100.000000  100.000000  100.000000  100.000000  100.000000  91.807271   100.000000  97.903339   100.000000  81.996363   100.000000  94.200479   87.929251   89.484993   86.827664   86.035818   87.447754   19.689448

这样做的方法是什么? 多谢各位

  • 有些列以Nan开头,但后来得到值
  • 在excel中,我将每行除以第一行,然后乘以100=(A2/$A$2)*100

Tags: 数据lambda代码框架绘图datenetnan
1条回答
网友
1楼 · 发布于 2024-03-28 08:47:57

如果要对每列应用规范化,则必须使用轴=0

Z评分标准化

“计算z分数的公式是z=(x-μ)/σ,其中x是原始分数,μ是总体平均值,σ是总体标准偏差。正如公式所示,z分数只是原始分数减去总体平均值,除以总体标准偏差。”

#get mean each column
mean = df.mean(axis=0)
#get standard deviation
std = df.std(axis=0)
#normalization
normalization = ((df - mean) / std)

还是一行

normalization = (df - df.mean()) / df.std()

最小最大规格化

normalization = (df-df.min()) / (df.max()-df.min())

如果要将值固定为100,只需乘以100即可

normalization = ( (df-df.min()) / (df.max()-df.min()) * 100 )

相关问题 更多 >