从分组对象中选择给定范围内的行

2024-04-25 00:13:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧,看起来像这样:

+------------+---------------------+---------+
|    action  |  ts                 |   uid   |
+------------+---------------------+---------+
| action1    | 2013-01-01 00:00:00 | 543534  |  
| action2    | 2013-01-01 00:00:00 | 543544  |
| action1    | 2013-01-01 00:00:02 | 543542  |
| action2    | 2013-01-01 00:00:03 | 543541  |
|   ....     |       ....          |   ...   |
+------------+---------------------+---------+

我想计算每个用户在给定时间范围内执行的每种类型的actions的数量,因此预期的输出是smth,如下所示:

    uid action1 action2
543534    10      1
543534    0      2
 ...

我想先应用.groupby('uid'),然后遍历分组对象,选择行,然后ts在给定范围内,然后将数据帧连接到结果数据帧中,排序

所以,像这样:

df = ...
start_date = ...
end_date = ...
result = {}

grouped = df.groupby('uid')
grouped_dict = dict(list(grouped))

for item in grouped.keys:
    df = grouped[item]    
    result[item] = df[df.ts > start_date and df.ts < end_date].size()

我没有运行过这个代码,但我认为即使它能工作,它的效率也非常低。甚至将分组对象转换为字典也需要很多时间。在这种情况下,什么方法更有效?你知道吗


Tags: 数据对象dfuiddate时间resultitem
2条回答

您可以按uidaction分组:

start_date = pd.to_datetime('2013-01-01 00:00:00')
end_date = pd.to_datetime('2013-01-01 00:00:07')
print df
print df[(df.ts > start_date) & (df.ts < end_date)].groupby(['uid','action'])['ts'].count().unstack('action').fillna(0)

输出:

    action                  ts  uid
0  action1 2013-01-01 00:00:00    1
1  action2 2013-01-01 00:00:00    2
2  action1 2013-01-01 00:00:02    2
3  action2 2013-01-01 00:00:03    1
4  action2 2013-01-01 00:00:04    2
5  action2 2013-01-01 00:00:05    1
6  action1 2013-01-01 00:00:06    1
action  action1  action2
uid                     
1             1        2
2             1        1

查看pandas.DataFrame的接口,我会选择如下数据:

# Select the interesting date range
bydate = df[(df['ts'] > start_date & df.ts < end_date]
# Now this will group for uid, *then* by action
grouped = bydate.groupby(('uid', 'action'))

现在,让我们只打印每个uid的操作数:

for indices, data in grouped:
    print("Uid {}, Action '{}': {}".format(indices[0], indices[1], len(data))

相关问题 更多 >