如何快速填写Pandas数据框的未命中季度结束记录?

2024-04-28 15:18:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我的数据帧df如下所示:

>>> df
                  sales  net_pft
STK_ID RPT_Date                 
000876 20061231  35.737    2.195
       20070630  20.247    1.653
       20080331  16.565    0.550
600141 20071231  15.915    0.440
       20080630   7.907    0.172
       20080930  12.505    0.503
       20090331   5.360    0.342
600809 20070331  15.274    2.606
       20070630   9.489    1.947
       20070930  13.017    2.464
       20080331   3.156    0.280

我想把每个STK_ID的未命中行与季度末列表RPT_日期列表进行对比

RPT_DATE_LIST = [
 '20060331', '20060630', '20060930', '20061231', \
 '20070331', '20070630', '20070930', '20071231', \
 '20080331', '20080630', '20080930', '20081231', \
 '20090331', '20090630', '20090930', '20091231']

结果应该是这样的:

>>> df
                  sales  net_pft
STK_ID RPT_Date                 
000876 20061231  35.737    2.195
       20070331  NaN       NaN
       20070630  20.247    1.653
       20070930  NaN       NaN
       20071231  NaN       NaN
       20080331  16.565    0.550
600141 20071231  15.915    0.440
       20080331  NaN       NaN
       20080630   7.907    0.172
       20080930  12.505    0.503
       20081231  NaN       NaN
       20090331   5.360    0.342
600809 20070331  15.274    2.606
       20070630   9.489    1.947
       20070930  13.017    2.464
       20071231  NaN       NaN
       20080331   3.156    0.280

如何快速做到这一点


Tags: 数据iddf列表datenetnanlist
1条回答
网友
1楼 · 发布于 2024-04-28 15:18:56
>>> def func(df, date_list):
...     lb, ub = df.RPT_Date.min(), df.RPT_Date.max()
...     pred = lambda x: lb <= x <= ub
...     df = df.set_index('RPT_Date').reindex_axis(filter(pred, date_list))
...     df.index.name = 'RPT_Date'
...     return df
... 
>>> grb = df.reset_index().groupby('STK_ID')
>>> grb.apply(func, date_list=RPT_DATE_LIST).drop('STK_ID', axis=1)
                  sales  net_pft
STK_ID RPT_Date                 
876    20061231  35.737    2.195
       20070331     NaN      NaN
       20070630  20.247    1.653
       20070930     NaN      NaN
       20071231     NaN      NaN
       20080331  16.565    0.550
600141 20071231  15.915    0.440
       20080331     NaN      NaN
       20080630   7.907    0.172
       20080930  12.505    0.503
       20081231     NaN      NaN
       20090331   5.360    0.342
600809 20070331  15.274    2.606
       20070630   9.489    1.947
       20070930  13.017    2.464
       20071231     NaN      NaN
       20080331   3.156    0.280

[17 rows x 2 columns]

相关问题 更多 >