In [50]: s="""time source id
...: 4-25-2014 A 1
...: 5-4-2014 A 1
...: 5-5-2014 A 1
...: 4-2-2013 B 12
...: 4-6-2013 B 12
...: 4-11-2013 B 12
...: 4-12-2013 B 12
...: 4-12-2013 B 12"""
In [51]: df = pd.read_csv(StringIO(s), sep="\s+")
In [52]: df['time'] = pd.to_datetime(df['time'])
In [53]: df
Out[53]:
time source id
0 2014-04-25 A 1
1 2014-05-04 A 1
2 2014-05-05 A 1
3 2013-04-02 B 12
4 2013-04-06 B 12
5 2013-04-11 B 12
6 2013-04-12 B 12
7 2013-04-12 B 12
然后,可以在分组对象的apply调用中选择所需的行:
In [57]: g = df.groupby(['source', 'id'])
In [58]: g.apply(lambda x : x[x['time'] > (x['time'].iloc[-1] - dt.timedelta(7))])
Out[58]:
time source id
source id
A 1 1 2014-05-04 A 1
2 2014-05-05 A 1
B 12 4 2013-04-06 B 12
5 2013-04-11 B 12
6 2013-04-12 B 12
7 2013-04-12 B 12
假设您有这样一个数据帧:
然后,可以在分组对象的apply调用中选择所需的行:
相关问题 更多 >
编程相关推荐