我一直在试图找出这个数据集中每个人所花费的大部分参与活动的时间:
name activity timestamp money_spent
0 Chandler Bing party 2017-08-04 08:00:00 51
1 Chandler Bing party 2017-08-04 13:00:00 60
2 Chandler Bing party 2017-08-04 15:00:00 59
5 Harry Kane party 2017-08-04 07:00:00 68
4 Harry Kane party 2017-08-04 11:00:00 90
3 Harry Kane football 2017-08-04 13:00:00 80
11 Joey Tribbiani football 2017-08-04 08:00:00 84
9 Joey Tribbiani party 2017-08-04 09:00:00 54
10 Joey Tribbiani party 2017-08-04 10:00:00 67
6 John Doe beach 2017-08-04 07:00:00 63
7 John Doe beach 2017-08-04 12:00:00 61
8 John Doe beach 2017-08-04 14:00:00 65
12 Monica Geller travel 2017-08-04 07:00:00 90
13 Monica Geller travel 2017-08-04 08:00:00 96
14 Monica Geller travel 2017-08-04 09:00:00 74
15 Phoebe Buffey travel 2017-08-04 10:00:00 52
16 Phoebe Buffey travel 2017-08-04 12:00:00 84
17 Phoebe Buffey football 2017-08-04 15:00:00 58
18 Ross Geller party 2017-08-04 09:00:00 96
19 Ross Geller party 2017-08-04 11:00:00 81
20 Ross Geller travel 2017-08-04 14:00:00 60
df['timestamp'] = pd.to_datetime(df.timestamp, format='%Y-%m-%d %H:%M:%S')
df # party day 2017-08-04 for some guys.
# find most involved activity and time spent on that activity per person.
所需输出:
activity_num activity time_diff
name
Chandler Bing 1.0 party 07:00:00
Harry Kane 2.0 party 04:00:00
Joey Tribbiani 2.0 party 02:00:00
John Doe 1.0 beach 07:00:00
Monica Geller 1.0 travel 02:00:00
Phoebe Buffey 2.0 travel 03:00:00
Ross Geller 2.0 travel 03:00:00
注:哈里·凯恩从早上7点到11点参加派对,所以他的回答是4小时。你知道吗
df.head()
name activity timestamp money_spent
0 Chandler Bing party 2017-08-04 08:00:00 51
1 Chandler Bing party 2017-08-04 13:00:00 60
2 Chandler Bing party 2017-08-04 15:00:00 59
3 Harry Kane football 2017-08-04 13:00:00 80
4 Harry Kane party 2017-08-04 11:00:00 90
5 Harry Kane party 2017-08-04 07:00:00 68
我的尝试:
df.groupby(['name','activity'])['timestamp'].max() # no idea
这绝对(可能)不是这样做的,但我们来看看:
输出:
检查下面
试试这个:
输出:
相关问题 更多 >
编程相关推荐