如何在调用resample后用值0填充na()?

2024-04-26 11:31:24 发布

您现在位置:Python中文网/ 问答频道 /正文

要么我不理解documentation要么它已经过时了。

如果我跑

user[["DOC_ACC_DT", "USER_SIGNON_ID"]].groupby("DOC_ACC_DT").agg(["count"]).resample("1D").fillna(value=0, method="ffill")

它得到

TypeError: fillna() got an unexpected keyword argument 'value'

如果我只是跑

.fillna(0)

我明白了

ValueError: Invalid fill method. Expecting pad (ffill), backfill (bfill) or nearest. Got 0

如果我这样做

.fillna(0, method="ffill") 

我明白了

TypeError: fillna() got multiple values for keyword argument 'method'

所以唯一有效的是

.fillna("ffill")

但这当然只是一个向前填充。但是,我想用零替换NaN。我在这里做错什么了?


Tags: docvaluedocumentationdtargumentkeywordmethodacc
3条回答

直接使用fillna的唯一解决方法是在执行.head(len(df.index))之后调用它。

我认为^{}在这种情况下是有用的,主要是因为当重采样函数应用于groupby对象时,它将充当输入的过滤器,由于组的消除而返回原始对象的简化形状。

调用DF.head()不受此转换的影响,并返回整个DF

演示:

np.random.seed(42)

df = pd.DataFrame(np.random.randn(10, 2),
              index=pd.date_range('1/1/2016', freq='10D', periods=10),
              columns=['A', 'B']).reset_index()

df
       index         A         B
0 2016-01-01  0.496714 -0.138264
1 2016-01-11  0.647689  1.523030
2 2016-01-21 -0.234153 -0.234137
3 2016-01-31  1.579213  0.767435
4 2016-02-10 -0.469474  0.542560
5 2016-02-20 -0.463418 -0.465730
6 2016-03-01  0.241962 -1.913280
7 2016-03-11 -1.724918 -0.562288
8 2016-03-21 -1.012831  0.314247
9 2016-03-31 -0.908024 -1.412304

操作:

resampled_group = df[['index', 'A']].groupby(['index'])['A'].agg('count').resample('2D')
resampled_group.head(len(resampled_group.index)).fillna(0).head(20)

index
2016-01-01    1.0
2016-01-03    0.0
2016-01-05    0.0
2016-01-07    0.0
2016-01-09    0.0
2016-01-11    1.0
2016-01-13    0.0
2016-01-15    0.0
2016-01-17    0.0
2016-01-19    0.0
2016-01-21    1.0
2016-01-23    0.0
2016-01-25    0.0
2016-01-27    0.0
2016-01-29    0.0
2016-01-31    1.0
2016-02-02    0.0
2016-02-04    0.0
2016-02-06    0.0
2016-02-08    0.0
Freq: 2D, Name: A, dtype: float64

好吧,我不明白为什么上面的代码不起作用,我要等有人给出比这更好的答案,但我发现

.replace(np.nan, 0)

做我想从.fillna(0)得到的事情。

我做了一些测试,很有趣。

样品:

import pandas as pd
import numpy as np

np.random.seed(1)
rng = pd.date_range('1/1/2012', periods=20, freq='S')
df = pd.DataFrame({'a':['a'] * 10 + ['b'] * 10,
                   'b':np.random.randint(0, 500, len(rng))}, index=rng)
df.b.iloc[3:8] = np.nan
print (df)
                     a      b
2012-01-01 00:00:00  a   37.0
2012-01-01 00:00:01  a  235.0
2012-01-01 00:00:02  a  396.0
2012-01-01 00:00:03  a    NaN
2012-01-01 00:00:04  a    NaN
2012-01-01 00:00:05  a    NaN
2012-01-01 00:00:06  a    NaN
2012-01-01 00:00:07  a    NaN
2012-01-01 00:00:08  a  335.0
2012-01-01 00:00:09  a  448.0
2012-01-01 00:00:10  b  144.0
2012-01-01 00:00:11  b  129.0
2012-01-01 00:00:12  b  460.0
2012-01-01 00:00:13  b   71.0
2012-01-01 00:00:14  b  237.0
2012-01-01 00:00:15  b  390.0
2012-01-01 00:00:16  b  281.0
2012-01-01 00:00:17  b  178.0
2012-01-01 00:00:18  b  276.0
2012-01-01 00:00:19  b  254.0

下采样

使用^{}可能的解决方案:

如果使用asfreq,则行为与通过first聚合相同:

print (df.groupby('a').resample('2S').first())
                       a      b
a                              
a 2012-01-01 00:00:00  a   37.0
  2012-01-01 00:00:02  a  396.0
  2012-01-01 00:00:04  a    NaN
  2012-01-01 00:00:06  a    NaN
  2012-01-01 00:00:08  a  335.0
b 2012-01-01 00:00:10  b  144.0
  2012-01-01 00:00:12  b  460.0
  2012-01-01 00:00:14  b  237.0
  2012-01-01 00:00:16  b  281.0
  2012-01-01 00:00:18  b  276.0
print (df.groupby('a').resample('2S').first().fillna(0))
                       a      b
a                              
a 2012-01-01 00:00:00  a   37.0
  2012-01-01 00:00:02  a  396.0
  2012-01-01 00:00:04  a    0.0
  2012-01-01 00:00:06  a    0.0
  2012-01-01 00:00:08  a  335.0
b 2012-01-01 00:00:10  b  144.0
  2012-01-01 00:00:12  b  460.0
  2012-01-01 00:00:14  b  237.0
  2012-01-01 00:00:16  b  281.0
  2012-01-01 00:00:18  b  276.0

print (df.groupby('a').resample('2S').asfreq().fillna(0))
                       a      b
a                              
a 2012-01-01 00:00:00  a   37.0
  2012-01-01 00:00:02  a  396.0
  2012-01-01 00:00:04  a    0.0
  2012-01-01 00:00:06  a    0.0
  2012-01-01 00:00:08  a  335.0
b 2012-01-01 00:00:10  b  144.0
  2012-01-01 00:00:12  b  460.0
  2012-01-01 00:00:14  b  237.0
  2012-01-01 00:00:16  b  281.0
  2012-01-01 00:00:18  b  276.0

如果使用replace,则另一个值聚合为mean

print (df.groupby('a').resample('2S').mean())
                           b
a                           
a 2012-01-01 00:00:00  136.0
  2012-01-01 00:00:02  396.0
  2012-01-01 00:00:04    NaN
  2012-01-01 00:00:06    NaN
  2012-01-01 00:00:08  391.5
b 2012-01-01 00:00:10  136.5
  2012-01-01 00:00:12  265.5
  2012-01-01 00:00:14  313.5
  2012-01-01 00:00:16  229.5
  2012-01-01 00:00:18  265.0
print (df.groupby('a').resample('2S').mean().fillna(0))
                           b
a                           
a 2012-01-01 00:00:00  136.0
  2012-01-01 00:00:02  396.0
  2012-01-01 00:00:04    0.0
  2012-01-01 00:00:06    0.0
  2012-01-01 00:00:08  391.5
b 2012-01-01 00:00:10  136.5
  2012-01-01 00:00:12  265.5
  2012-01-01 00:00:14  313.5
  2012-01-01 00:00:16  229.5
  2012-01-01 00:00:18  265.0

print (df.groupby('a').resample('2S').replace(np.nan,0))
                           b
a                           
a 2012-01-01 00:00:00  136.0
  2012-01-01 00:00:02  396.0
  2012-01-01 00:00:04    0.0
  2012-01-01 00:00:06    0.0
  2012-01-01 00:00:08  391.5
b 2012-01-01 00:00:10  136.5
  2012-01-01 00:00:12  265.5
  2012-01-01 00:00:14  313.5
  2012-01-01 00:00:16  229.5
  2012-01-01 00:00:18  265.0

上采样

使用asfreq,它与replace相同:

print (df.groupby('a').resample('200L').asfreq().fillna(0))
                           a      b
a                                  
a 2012-01-01 00:00:00.000  a   37.0
  2012-01-01 00:00:00.200  0    0.0
  2012-01-01 00:00:00.400  0    0.0
  2012-01-01 00:00:00.600  0    0.0
  2012-01-01 00:00:00.800  0    0.0
  2012-01-01 00:00:01.000  a  235.0
  2012-01-01 00:00:01.200  0    0.0
  2012-01-01 00:00:01.400  0    0.0
  2012-01-01 00:00:01.600  0    0.0
  2012-01-01 00:00:01.800  0    0.0
  2012-01-01 00:00:02.000  a  396.0
  2012-01-01 00:00:02.200  0    0.0
  2012-01-01 00:00:02.400  0    0.0
  ...

print (df.groupby('a').resample('200L').replace(np.nan,0))
                               b
a                               
a 2012-01-01 00:00:00.000   37.0
  2012-01-01 00:00:00.200    0.0
  2012-01-01 00:00:00.400    0.0
  2012-01-01 00:00:00.600    0.0
  2012-01-01 00:00:00.800    0.0
  2012-01-01 00:00:01.000  235.0
  2012-01-01 00:00:01.200    0.0
  2012-01-01 00:00:01.400    0.0
  2012-01-01 00:00:01.600    0.0
  2012-01-01 00:00:01.800    0.0
  2012-01-01 00:00:02.000  396.0
  2012-01-01 00:00:02.200    0.0
  2012-01-01 00:00:02.400    0.0
  ...
print ((df.groupby('a').resample('200L').replace(np.nan,0).b == 
       df.groupby('a').resample('200L').asfreq().fillna(0).b).all())
True

结论:

对于下采样,使用相同的聚合函数,如sumfirstmean,对于上采样asfreq

相关问题 更多 >