Pandas中的布尔值重采样

import pandas as pd import numpy as np dr = pd.date_range('01-01-2020 5:00', periods=10, freq='H') df = pd.DataFrame({'Bools':[True,True,False,False,False,True,True,np.nan,np.nan,False], "Nums":range(10)}, index=dr)

Bools Nums 2020-01-01 05:00:00 True 0 2020-01-01 06:00:00 True 1 2020-01-01 07:00:00 False 2 2020-01-01 08:00:00 False 3 2020-01-01 09:00:00 False 4 2020-01-01 10:00:00 True 5 2020-01-01 11:00:00 True 6 2020-01-01 12:00:00 NaN 7 2020-01-01 13:00:00 NaN 8 2020-01-01 14:00:00 False 9

>>> r = df.resample('5H') >>> copy = df.copy() #just doing this to preserve df for the example >>> copy['Bools'] = copy['Bools'].astype(float) >>> copy.resample('5H').sum() Bools Nums 2020-01-01 05:00:00 2.0 10 2020-01-01 10:00:00 2.0 35

2条回答

网友

1楼 · 编辑于 2024-05-16 17:56:53

那么追踪显示,

df.resample('5H')['Bools'].sum == Groupby.sum (in pd.core.groupby.generic.SeriesGroupBy)

df.resample('5H').sum == sum (in pandas.core.resample.DatetimeIndexResampler)

在groupby.py中跟踪groupby_function表明它相当于 r.agg(lambda x: np.sum(x, axis=r.axis)) 其中r = df.resample('5H')输出：

                     Bools  Nums  Nums2
2020-01-01 05:00:00      2    10     10
2020-01-01 10:00:00      2    35     35

实际上，它应该是r = df.resample('5H')['Bool']（仅适用于上述情况）

追踪resample.py中的_downsample函数可以发现它相当于： df.groupby(r.grouper, axis=r.axis).agg(np.sum)输出：

                     Nums  Nums2
2020-01-01 05:00:00    10     10
2020-01-01 10:00:00    35     35

网友

2楼 · 编辑于 2024-05-16 17:56:53

df.resample('5H').sum()对Bools列不起作用，因为该列具有混合数据类型，在pandas中为object。在resample或groupby上调用sum()时，将忽略object类型的列

相关问题更多 >

编程相关推荐

热门问题

热门文章