Python：如何根据其他列的最大值来查找时差？

name activity timestamp money_spent 0 Chandler Bing party 2017-08-04 08:00:00 51 1 Chandler Bing party 2017-08-04 13:00:00 60 2 Chandler Bing party 2017-08-04 15:00:00 59 5 Harry Kane party 2017-08-04 07:00:00 68 4 Harry Kane party 2017-08-04 11:00:00 90 3 Harry Kane football 2017-08-04 13:00:00 80 11 Joey Tribbiani football 2017-08-04 08:00:00 84 9 Joey Tribbiani party 2017-08-04 09:00:00 54 10 Joey Tribbiani party 2017-08-04 10:00:00 67 6 John Doe beach 2017-08-04 07:00:00 63 7 John Doe beach 2017-08-04 12:00:00 61 8 John Doe beach 2017-08-04 14:00:00 65 12 Monica Geller travel 2017-08-04 07:00:00 90 13 Monica Geller travel 2017-08-04 08:00:00 96 14 Monica Geller travel 2017-08-04 09:00:00 74 15 Phoebe Buffey travel 2017-08-04 10:00:00 52 16 Phoebe Buffey travel 2017-08-04 12:00:00 84 17 Phoebe Buffey football 2017-08-04 15:00:00 58 18 Ross Geller party 2017-08-04 09:00:00 96 19 Ross Geller party 2017-08-04 11:00:00 81 20 Ross Geller travel 2017-08-04 14:00:00 60 df['timestamp'] = pd.to_datetime(df.timestamp, format='%Y-%m-%d %H:%M:%S') df # party day 2017-08-04 for some guys. # find most involved activity and time spent on that activity per person.

activity_num activity time_diff name Chandler Bing 1.0 party 07:00:00 Harry Kane 2.0 party 04:00:00 Joey Tribbiani 2.0 party 02:00:00 John Doe 1.0 beach 07:00:00 Monica Geller 1.0 travel 02:00:00 Phoebe Buffey 2.0 travel 03:00:00 Ross Geller 2.0 travel 03:00:00

df.head() name activity timestamp money_spent 0 Chandler Bing party 2017-08-04 08:00:00 51 1 Chandler Bing party 2017-08-04 13:00:00 60 2 Chandler Bing party 2017-08-04 15:00:00 59 3 Harry Kane football 2017-08-04 13:00:00 80 4 Harry Kane party 2017-08-04 11:00:00 90 5 Harry Kane party 2017-08-04 07:00:00 68

3条回答

网友

1楼 · 编辑于 2024-05-15 06:31:13

这绝对（可能）不是这样做的，但我们来看看：

### Get the max of the timestampe into separate dataframes
df_max = df.loc[df.groupby(['name','activity',])['timestamp'].idxmax()].reset_index(drop=True)
df_min = df.loc[df.groupby(['name','activity',])['timestamp'].idxmin()].reset_index(drop=True)

### Merge those puppies on the index values
df_tot = df_max.merge(df_min, how='outer', left_index=True, right_index=True, suffixes= ('_max', '_min'))

### Subtract the max timestamp from the minimum timestamp
df_tot['net time'] = df_tot['timestamp_max'] - df_tot['timestamp_min']

### Drop unnecessary columns
df_tot.drop(['name_min','activity_min','timestamp_min','money_spent_min', 'money_spent_max','timestamp_max'], axis=1, inplace=True)

### Rename our columns
df_tot = df_tot.rename(columns={i:i.replace('_max', '') for i in df_tot.columns.values.tolist()})

### Set activity_number as the cumulative count of name
df_tot['activity_number'] = df_tot.groupby('name').cumcount() + 1

### Get the max of that result
df_tot = df_tot.loc[df_tot.groupby(['name',])['net time'].idxmax()].reset_index(drop=True)

### Rearrange our results
df_tot = df_tot.reindex(columns=['name','activity_number', 'net time']).copy()

输出：

             name  activity_number net time
0   Chandler Bing                1 07:00:00
1      Harry Kane                2 04:00:00
2  Joey Tribbiani                2 01:00:00
3        John Doe                1 07:00:00
4   Monica Geller                1 02:00:00
5   Phoebe Buffey                2 02:00:00
6     Ross Geller                1 02:00:00

网友

2楼 · 编辑于 2024-05-15 06:31:13

检查下面

s=df.groupby(['name','activity']).timestamp.apply(pd.Series.ptp).reset_index()
#s=df.groupby(['name','activity']).timestamp.apply(np.ptp).reset_index()
uni=s.groupby('name').activity.nunique()
s=s.sort_values('timestamp').drop_duplicates('name',keep='last')
s['numberofact']=s.name.map(uni)
s
             name activity timestamp  numberofact
4  Joey Tribbiani    party  01:00:00            2
6   Monica Geller   travel  02:00:00            1
8   Phoebe Buffey   travel  02:00:00            2
9     Ross Geller    party  02:00:00            2
2      Harry Kane    party  04:00:00            2
0   Chandler Bing    party  07:00:00            1
5        John Doe    beach  07:00:00            1

网友

3楼 · 编辑于 2024-05-15 06:31:13

试试这个：

gb = df.groupby(['name', 'activity'])['timestamp']

print((gb.max() - gb.min()).sort_values(ascending=False).reset_index().drop_duplicates(subset='name'))

输出：

             name activity timestamp
0        John Doe    beach  07:00:00
1   Chandler Bing    party  07:00:00
2      Harry Kane    party  04:00:00
3     Ross Geller    party  02:00:00
4   Phoebe Buffey   travel  02:00:00
5   Monica Geller   travel  02:00:00
6  Joey Tribbiani    party  01:00:00

相关问题更多 >

编程相关推荐

热门问题

热门文章