Pandas的复杂时间操作

2024-04-26 10:28:45 发布

您现在位置:Python中文网/ 问答频道 /正文

下面是我非常大的数据帧的一个小示例:

In [38]: df
Out[38]:
          Send_Customer         Pay_Customer           Send_Time
 0       1000000000284044644  1000000000251680999 2016-08-01 09:55:48
 1       2000000000223021617  1000000000190078650 2016-08-01 02:44:23
 2       2000000000289301033  1000000000309048473 2016-08-01 09:20:14
 3       1000000000333893941  1000000000333956151 2016-08-01 09:20:14
 4       1000000000340371553  2000000000103942022 2016-08-01 09:20:14
 5       2000000000098132192  2000000000089264458 2016-08-01 09:21:27
 6       1000000000007716594  2000000000144437513 2016-08-01 09:20:54
 7       1000000000135884145  1000000000278399847 2016-08-01 09:21:43
 8       2000000000141318366  2000000000151080468 2016-08-01 09:20:46
 9       1000000000056842546  2000000000139908360 2016-08-01 09:20:55
10       1000000000275051425  2000000000254558241 2016-08-01 09:20:17
11       1000000000162362467  1000000000340653197 2016-08-01 09:23:45
12       1000000000039529533  1000000000072903285 2016-08-01 09:22:56
13       1000000000034147075  2000000000079408765 2016-08-01 09:20:17
14       1000000000319501203  1000000000337830072 2016-08-01 09:20:20
15       1000000000025289495  2000000000287368163 2016-08-01 09:20:31
16       1000000000043110429  1000000000209850047 2016-08-01 09:22:33

我需要找出,在10小时的时间跨度内,一个Send_Customer有多少个非唯一的或唯一的Pay_Customers?你知道吗

所以,我使用的方法是:

In [39]: df['time_diff'] = df.groupby('Send_Customer')['Send_Time'].apply(lambda x : x.diff().abs())

In [41]: df[df['time_diff']<=dt.timedelta(seconds=36000)]
Out[41]:

Send_Customer         Pay_Customer           Send_Time         \       

4361    1000000000284044644  1000000000326834813 2016-08-01 14:32:17
7530    2000000000223021617  1000000000340199555 2016-08-01 04:49:41
10937   2000000000148219588  1000000000312697109 2016-08-01 04:49:40
12876   1000000000339947901  2000000000218218239 2016-08-01 14:51:51
13553   1000000000248905073  1000000000248729812 2016-08-01 16:44:35
14281   2000000000270573223  1000000000341120021 2016-08-01 09:35:11

        time_diff
4361     00:10:37
7530     00:17:06
10937    01:09:45
12876    00:53:59
13553    01:12:17
14281    05:19:34

这种方法的部分工作原理是,在['Send_Time']上使用.diff()可以消除用于获取差异的第一行。你有没有想过如何保留这样的争吵?你知道吗


Tags: 数据方法insend示例dftimediff
1条回答
网友
1楼 · 发布于 2024-04-26 10:28:45

如果我理解正确:在diff之后的第一行是NaT。为了保留第一行,可以将NaT值替换为不会被条件过滤掉的值,例如0。你知道吗

在这里,我只需在第一行末尾添加.fillna(0)

df['time_diff'] = df.groupby('Send_Customer')['Send_Time'].apply(
        lambda x : x.diff().abs()
    ).fillna(0)

df[df['time_diff'] <= dt.timedelta(seconds=36000)]

相关问题 更多 >