在pandas数据框中查找工作日组的平均值

2024-05-13 05:48:59 发布

您现在位置:Python中文网/ 问答频道 /正文

我的数据集是这样的:

         tripduration           starttime   User Type
0                 732   7/1/2015 00:00:03  Subscriber
1                 322   7/1/2015 00:00:06  Subscriber
2                 790   7/1/2015 00:00:17  Subscriber
3                1228   7/1/2015 00:00:23  Subscriber
4                1383   7/1/2015 00:00:44  Subscriber
5                 603   7/1/2015 00:01:00  Subscriber
6                 520   7/1/2015 00:01:03  Subscriber
7                 289   7/1/2015 00:01:06  Subscriber
8                1771   7/1/2015 00:01:25    Customer
9                 813   7/1/2015 00:01:41  Subscriber
10               1735   7/1/2015 00:01:50    Customer
11                832   7/1/2015 00:01:58  Subscriber
12               1210   7/1/2015 00:02:06  Subscriber
13                746   7/1/2015 00:02:07  Subscriber
14                749   7/1/2015 00:02:26  Subscriber
15                463   7/1/2015 00:02:26  Subscriber
16                331   7/1/2015 00:02:35  Subscriber
17                951   7/1/2015 00:02:43    Customer
18               1352   7/1/2015 00:02:47    Customer
19                275   7/1/2015 00:02:47  Subscriber
20                199   7/1/2015 00:03:05  Subscriber
21                383   7/1/2015 00:03:16    Customer
22               4210   7/1/2015 00:03:27  Subscriber
23                584   7/1/2015 00:03:34  Subscriber
24                735   7/1/2015 00:03:48  Subscriber
25                827   7/1/2015 00:03:56  Subscriber
26                677   7/1/2015 00:03:57  Subscriber
27               2371   7/1/2015 00:03:58    Customer
28                666   7/1/2015 00:04:03  Subscriber
29                999   7/1/2015 00:04:17  Subscriber
...               ...                 ...         ...
1085646           243  7/31/2015 23:57:25  Subscriber
1085647          1378  7/31/2015 23:57:29    Customer
1085648           230  7/31/2015 23:57:32  Subscriber
1085649          1669  7/31/2015 23:57:33  Subscriber
1085650           493  7/31/2015 23:57:44  Subscriber
1085651           822  7/31/2015 23:57:54  Subscriber
1085652           617  7/31/2015 23:58:03  Subscriber
1085653           349  7/31/2015 23:58:08  Subscriber
1085654           818  7/31/2015 23:58:12    Customer
1085655          2062  7/31/2015 23:58:15  Subscriber
1085656           945  7/31/2015 23:58:18    Customer
1085657           346  7/31/2015 23:58:24  Subscriber
1085658           399  7/31/2015 23:58:27  Subscriber
1085659           641  7/31/2015 23:58:42  Subscriber
1085660          1872  7/31/2015 23:58:43  Subscriber
1085661         12065  7/31/2015 23:58:51    Customer
1085662           265  7/31/2015 23:58:53  Subscriber
1085663           936  7/31/2015 23:58:58  Subscriber
1085664           395  7/31/2015 23:59:04  Subscriber
1085665           238  7/31/2015 23:59:10  Subscriber
1085666           551  7/31/2015 23:59:24  Subscriber
1085667           423  7/31/2015 23:59:23    Customer
1085668          1623  7/31/2015 23:59:24  Subscriber
1085669          1632  7/31/2015 23:59:24  Subscriber
1085670           305  7/31/2015 23:59:38  Subscriber
1085671           275  7/31/2015 23:59:40  Subscriber
1085672           530  7/31/2015 23:59:41  Subscriber
1085673           273  7/31/2015 23:59:42    Customer
1085674          1273  7/31/2015 23:59:56  Subscriber
1085675          1667  7/31/2015 23:59:59  Subscriber

我的问题

订户在任何工作日(星期一至星期五)的平均旅行时间是多少?在

我的代码

函数a4()应返回平均值(浮点数为两位小数):

^{pr2}$

我被困在这里是为了得到工作日(周一到周五)来计算tripduration的平均值。 我试图使用parser.parse(df1['starttime'])解析starttime,但得到一个错误:

TypeError: Parser must be a string or character stream, not Series

什么是获得工作日平均值的正确方法?在


Tags: 数据函数代码type时间customera4subscriber
2条回答

我想您需要先转换^{}starttime。在

然后按^{}过滤。在

如果所有workday都需要一个标量值,请使用loc来选择带有mean的列:

def a4(rides):
    rides['starttime'] = pd.to_datetime(rides['starttime'])
    m = (rides['starttime'].dt.dayofweek < 5) & (rides['User Type'] == 'Subscriber')
    return round(rides.loc[m, 'tripduration'].mean(), 2)

print (a4(rides))
825.33

如果每天需要分别使用^{}添加新条件,然后使用聚合mean添加{a4}:

^{pr2}$

如果不需要天数,请使用^{}

^{3}$
df = pd.read_csv(...., parse_dates='starttime')

使用布尔索引进行筛选,并groupbydayofweek来计算mean。在

^{pr2}$

相关问题 更多 >