熊猫分组.diff 填充缺失行为零

2024-04-19 04:23:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我肯定这是张贴在某处,或这么简单,我没有看到它,但我没有运气找到一个张贴。任何帮助都是大有裨益的。你知道吗

我想做一个分组方式差异如你所见。如果缺少日期,我需要显示负值。你知道吗

df['delta'] = df.groupby(['ID', 'ticker', 'date'])['shares'].diff()

ID  ticker date         shares  delta
A   AAA    3/31/2012    904180  675010
A   AAA    12/31/2011   229170  NaN
A   BBB    3/31/2012    517756  390117
A   BBB    12/31/2011   127639  NaN
A   CCC    12/31/2011   1757    NaN
A   DDD    12/31/2011   500     NaN
B   AAA    3/31/2012    920920  554920
B   AAA   12/31/2011    366000  NaN
B   BBB    3/31/2012    524     393
B   BBB   12/31/2011    131     NaN

我想我需要填充才能得到这个:

ID  ticker date         shares  delta
A   AAA    3/31/2012    904180  675010
A   AAA    12/31/2011   229170  NaN
A   BBB    3/31/2012    517756  390117
A   BBB    12/31/2011   127639  NaN
A   CCC    3/31/2012    0       -1757
A   CCC    12/31/2011   1757    NaN
A   DDD    3/31/2012    0       -500
A   DDD    12/31/2011   500     NaN
B   AAA    3/31/2012    920920  554920
B   AAA   12/31/2011    366000  NaN
B   BBB    3/31/2012    524     393
B   BBB   12/31/2011    131     NaN

再次感谢


Tags: iddfdate方式差异nantickerdelta
1条回答
网友
1楼 · 发布于 2024-04-19 04:23:18

使用unstack+stack

New_df=df.set_index(['ID','ticker','date']).unstack('date').stack(dropna=False).reset_index().fillna(0)
New_df['delta'] = New_df.groupby(['ID', 'ticker', 'date'])['shares'].diff()

# you should not groupby date, it will return all NaN after you did diff
New_df['delta'] = New_df.groupby(['ID', 'ticker'])['shares'].diff()
#New_df['delta'] = New_df.groupby(['ID', 'ticker','date'])['shares'].diff()
New_df
Out[316]: 
   ID ticker        date    shares     delta
0   A    AAA  12/31/2011  229170.0       NaN
1   A    AAA   3/31/2012  904180.0  675010.0
2   A    BBB  12/31/2011  127639.0       NaN
3   A    BBB   3/31/2012  517756.0  390117.0
4   A    CCC  12/31/2011    1757.0       NaN
5   A    CCC   3/31/2012       0.0   -1757.0
6   A    DDD  12/31/2011     500.0       NaN
7   A    DDD   3/31/2012       0.0    -500.0
8   B    AAA  12/31/2011  366000.0       NaN
9   B    AAA   3/31/2012  920920.0  554920.0
10  B    BBB  12/31/2011     131.0       NaN
11  B    BBB   3/31/2012     524.0     393.0

排序后

New_df.sort_values(['ID','ticker','date'],ascending=[True,True,False])
Out[318]: 
   ID ticker        date    shares     delta
1   A    AAA   3/31/2012  904180.0  675010.0
0   A    AAA  12/31/2011  229170.0       NaN
3   A    BBB   3/31/2012  517756.0  390117.0
2   A    BBB  12/31/2011  127639.0       NaN
5   A    CCC   3/31/2012       0.0   -1757.0
4   A    CCC  12/31/2011    1757.0       NaN
7   A    DDD   3/31/2012       0.0    -500.0
6   A    DDD  12/31/2011     500.0       NaN
9   B    AAA   3/31/2012  920920.0  554920.0
8   B    AAA  12/31/2011  366000.0       NaN
11  B    BBB   3/31/2012     524.0     393.0
10  B    BBB  12/31/2011     131.0       NaN

相关问题 更多 >