pandas数据框按年份和公司添加具有函数的列

2024-04-18 13:25:41 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个名为'sm'的数据框,下面有很多行,我想按公司和年份计算销售增长率,并将其插入新的列名'sg'。你知道吗

    fyear   ch      conm        sale    ipodate
0   1996    51.705  AAR CORP    589.328 NaN
1   1997    17.222  AAR CORP    782.123 NaN
2   1998    8.250   AAR CORP    918.036 NaN
3   1999    1.241   AAR CORP    1024.333    NaN
4   2000    13.809  AAR CORP    874.255 NaN
5   2001    34.522  AAR CORP    638.721 NaN
6   2002    29.154  AAR CORP    606.337 NaN
7   2003    41.010  AAR CORP    651.958 NaN
8   2004    40.508  AAR CORP    747.848 NaN
9   2005    121.738 AAR CORP    897.284 NaN
10  2006    83.317  AAR CORP    1061.169    NaN
11  2007    109.391 AAR CORP    1384.919    NaN
12  2008    112.505 AAR CORP    1423.976    NaN
13  2009    79.370  AAR CORP    1352.151    NaN
14  2010    57.433  AAR CORP    1775.782    NaN
15  2011    67.720  AAR CORP    2074.498    NaN
16  2012    75.300  AAR CORP    2167.100    NaN
17  2013    89.200  AAR CORP    2035.000    NaN
18  2014    54.700  AAR CORP    1594.300    NaN
19  2015    31.200  AAR CORP    1662.600    NaN
20  1997    64.000  AMERICAN AIRLINES GROUP INC 18570.000   NaN
21  1998    95.000  AMERICAN AIRLINES GROUP INC 19205.000   NaN
22  1999    85.000  AMERICAN AIRLINES GROUP INC 17730.000   NaN
23  2000    89.000  AMERICAN AIRLINES GROUP INC 19703.000   NaN
24  2001    120.000 AMERICAN AIRLINES GROUP INC 18963.000   NaN
115466  2014    290.500 ALLEGION PLC    2118.300    NaN
115467  2015    199.700 ALLEGION PLC    2068.100    NaN
115468  2016    312.400 ALLEGION PLC    2238.000    NaN
115470  2013    2.063   AGILITY HEALTH INC  63.052  NaN
115471  2014    1.301   AGILITY HEALTH INC  62.105  NaN
115472  2015    1.307   AGILITY HEALTH INC  62.328  NaN
115473  2013    109.819 NORDIC AMERICAN OFFSHORE    NaN NaN
115474  2014    46.398  NORDIC AMERICAN OFFSHORE    52.789  NaN
115475  2015    5.339   NORDIC AMERICAN OFFSHORE    36.372  NaN
115476  2016    2.953   NORDIC AMERICAN OFFSHORE    16.249  NaN
115477  2011    2.040   DORIAN LPG LTD  34.571  20140508.0
115478  2012    1.042   DORIAN LPG LTD  38.662  20140508.0
115479  2013    279.132 DORIAN LPG LTD  29.634  20140508.0
115480  2014    204.821 DORIAN LPG LTD  104.129 20140508.0
115481  2015    46.412  DORIAN LPG LTD  289.208 20140508.0
115482  2013    948.684 NOMAD FOODS LTD 2074.842    NaN
115483  2014    855.541 NOMAD FOODS LTD 1816.239    NaN
115484  2015    671.846 NOMAD FOODS LTD 971.013 NaN
115485  2016    347.688 NOMAD FOODS LTD 2034.109    NaN
115487  2014    2638.000    ATHENE HOLDING LTD  4100.000    20161209.0
115488  2015    2720.000    ATHENE HOLDING LTD  2616.000    20161209.0
115489  2016    2459.000    ATHENE HOLDING LTD  4107.000    20161209.0
115490  2013    3.956   MIDATECH PHARMA PLC 0.244   NaN
115491  2014    47.240  MIDATECH PHARMA PLC 0.245   NaN
115492  2015    23.852  MIDATECH PHARMA PLC 2.028   NaN
115493  2016    21.723  MIDATECH PHARMA PLC 8.541   NaN

我实现了这样的代码

d = sm.loc[sm['conm'] == 'AAR CORP']
dt = d.loc[d.fyear == 1996,'sale'].values[0]
dtp1 = d.loc[d.fyear == 1997,'sale'].values[0]
sg = (dtp1-dt)/ dt * 100
d.ix[d.fyear == 1997,'sg'] = sg

它给了我最后的专栏。你知道吗

    fyear   ch      conm        sale    ipodate  sg
0   1996    51.705  AAR CORP    589.328 NaN      Nan
1   1997    17.222  AAR CORP    782.123 NaN      32.71438
2   1998    8.250   AAR CORP    918.036 NaN      Nan

我希望“sg”列位于“sale”列的旁边,我希望计算每个公司每一年(1996-2015年)的销售增长率,并插入到给定年份t的同一行中。现在,我将公司名称切片到小数据框中,然后计算销售增长率,但由于我唯一的公司名称超过9000,所以我现在的方法似乎效率很低。我可以不按所有公司的名字来划分吗?先谢谢你。你知道吗


Tags: group公司salenansgincplcamerican
3条回答
#sort df first
df.sort_values(by=['conm','fyear'],inplace=True)
#you can use groupby first by company, get the rolling increase for each company and then insert it to the dataframe.

df.insert(df.columns.tolist().index('sale')+1,
          'sm',df.groupby(by=['conm'])['sale']\
          .apply(lambda x: x.rolling(2).apply(lambda y: (y[1]-y[0])/y[0]*100)))

df
Out[296]: 
    fyear       ch                         conm       sale         sm  ipodate
0    1996   51.705                     AAR CORP    589.328        NaN      NaN
1    1997   17.222                     AAR CORP    782.123  32.714380      NaN
2    1998    8.250                     AAR CORP    918.036  17.377446      NaN
3    1999    1.241                     AAR CORP   1024.333  11.578740      NaN
4    2000   13.809                     AAR CORP    874.255 -14.651290      NaN
5    2001   34.522                     AAR CORP    638.721 -26.941110      NaN
6    2002   29.154                     AAR CORP    606.337  -5.070132      NaN
7    2003   41.010                     AAR CORP    651.958   7.524034      NaN
8    2004   40.508                     AAR CORP    747.848  14.708003      NaN
9    2005  121.738                     AAR CORP    897.284  19.982135      NaN
10   2006   83.317                     AAR CORP   1061.169  18.264563      NaN
11   2007  109.391                     AAR CORP   1384.919  30.508807      NaN
12   2008  112.505                     AAR CORP   1423.976   2.820165      NaN
13   2009   79.370                     AAR CORP   1352.151  -5.043975      NaN
14   2010   57.433                     AAR CORP   1775.782  31.330155      NaN
15   2011   67.720                     AAR CORP   2074.498  16.821659      NaN
16   2012   75.300                     AAR CORP   2167.100   4.463827      NaN
17   2013   89.200                     AAR CORP   2035.000  -6.095704      NaN
18   2014   54.700                     AAR CORP   1594.300 -21.656020      NaN
19   2015   31.200                     AAR CORP   1662.600   4.284012      NaN
20   1997   64.000  AMERICAN AIRLINES GROUP INC  18570.000        NaN      NaN
21   1998   95.000  AMERICAN AIRLINES GROUP INC  19205.000   3.419494      NaN
22   1999   85.000  AMERICAN AIRLINES GROUP INC  17730.000  -7.680292      NaN
23   2000   89.000  AMERICAN AIRLINES GROUP INC  19703.000  11.128032      NaN
24   2001  120.000  AMERICAN AIRLINES GROUP INC  18963.000  -3.755773      NaN

另一种可能的解决方案(因为它不使用apply(),所以可能更快):

df['sm'] = (df.sort_values(['conm', 'fyear'])\
              .groupby('conm')['sale']\
              .diff()\
              .shift(-1) / df['sale']).shift() * 100

此解决方案假设连续年份之间始终存在1年的差异。你知道吗

看看pct_change

df['sg'] = df[['sale']].pct_change()

相关问题 更多 >