按列值而不是z对数据范围进行分组

2024-04-26 22:45:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下数据帧

            count
2015-09-28      2
2015-09-29      2
2015-09-30      0
2015-10-01      2
2015-10-02      3
2015-10-05      2
2015-10-06      1
2015-10-07      0
2015-10-08      1

我想按数据范围分组,这些数据范围用count==0分隔。我想要这样的东西

  first      last       totalcount
1 2015-09-28 2015-09-29 4
2 2015-10-01 2015-10-06 8
3 2015-10-08 2015-10-08 1

Tags: 数据countfirstlasttotalcount
1条回答
网友
1楼 · 发布于 2024-04-26 22:45:15

使用^{}将每行与组号关联:

In [134]: df['groupno'] = (df['count'] == 0).cumsum()

In [135]: df
Out[135]: 
            count  groupno
2015-09-28      2        0
2015-09-29      2        0
2015-09-30      0        1
2015-10-01      2        1
2015-10-02      3        1
2015-10-05      2        1
2015-10-06      1        1
2015-10-07      0        2
2015-10-08      1        2

然后可以使用^{}获得所需的结果:

import pandas as pd
df = pd.DataFrame({'count': [2, 2, 0, 2, 3, 2, 1, 0, 1]},
                  index=[u'2015-09-28', u'2015-09-29', u'2015-09-30', u'2015-10-01',
                         u'2015-10-02', u'2015-10-05', u'2015-10-06', u'2015-10-07',
                         u'2015-10-08'])


mask = (df['count'] == 0)
df['groupno'] = mask.cumsum()
# Remove the rows where the count is 0
df = df.loc[~mask]
# Make the index a column so we can use 'index':['first', 'last'] to find the
# first and last date in each group.
df = df.reset_index()
result = df.groupby('groupno').agg({'index':['first', 'last'], 'count':'sum'})
result.columns = result.columns.droplevel(0)
result = result.rename(columns={'sum':'totalcount'})

收益率

         totalcount       first        last
groupno                                    
0                 4  2015-09-28  2015-09-29
1                 8  2015-10-01  2015-10-06
2                 1  2015-10-08  2015-10-08

相关问题 更多 >