在groupby下使用函数

2024-05-26 22:56:10 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个专栏'dateTime',我正在努力实现以下目标(没有gropuby也行):

df['time_of_day_10'] = df['dateTime'].dt.floor('10min')
df['time_of_day_30'] = df['dateTime'].dt.floor('30min')

但问题是,在我使用以下方法收集数据之后:

    groups = df.groupby(groupbytime,as_index=True) 
    df_grouped = (groups.agg({
                'clients1': [np.mean,np.max,],
                'clients2': [np.mean,np.max,],
                }))

我失去了我的约会时间,所以我试图添加回来,并添加了以下内容:

 groups = df.groupby(groupbytime,as_index=True) 
  df_grouped = (groups.agg({
                'dateTime':['first'],
                 'clients1': [np.mean,np.max,],
                 'clients2': [np.mean,np.max,],
                 }))

这样我就可以知道

dateTime           first               datetime64[ns]

我正试图在groupby中把四舍五入的时间和日期作为分开的coulmns。 谢谢!你知道吗

添加示例数据: 原始数据:

    dateTime    Clients1    Clients2
8   2017-10-23 08:00:04.854309  12991.5 2
10  2017-10-23 08:00:04.875162  12991.5 1
11  2017-10-23 08:00:04.875162  12991.5 1
12  2017-10-23 08:00:04.875162  12991.5 1
13  2017-10-23 08:00:04.875162  12991.5 1
23  2017-10-23 08:00:04.876464  12989.5 1
24  2017-10-23 08:00:04.876464  12989.5 1
32  2017-10-23 08:00:04.964356  12990   1
34  2017-10-23 08:00:04.968549  12990.5 1
38  2017-10-23 08:00:05.008758  12990   1
43  2017-10-23 08:00:05.996090  12990   2
45  2017-10-23 08:00:06.018212  12990   1
51  2017-10-23 08:00:06.344568  12989.5 1
56  2017-10-23 08:00:06.903661  12990   1
60  2017-10-23 08:00:07.120324  12990   1
66  2017-10-23 08:00:07.206179  12990.5 1
74  2017-10-23 08:00:07.358889  12991.5 3
77  2017-10-23 08:00:07.491244  12991   1
80  2017-10-23 08:00:07.671106  12991   1
83  2017-10-23 08:00:07.897968  12991   1
87  2017-10-23 08:00:08.028444  12991   1
95  2017-10-23 08:00:09.787827  12991.5 3
98  2017-10-23 08:00:10.178936  12991.5 3
104 2017-10-23 08:00:10.505921  12991.5 2
110 2017-10-23 08:00:11.438628  12992   1
112 2017-10-23 08:00:12.145907  12992   1

结果是:

    dateTime    Clients1    Clients1    Clients2    Clients2
    first   mean    amax    mean    amax
1min                    
2017-10-23 08:00:00 2017-10-23 08:00:04.854309  12988.8902439024    12993.5 227 12987.7398373984
2017-10-23 08:01:00 2017-10-23 08:01:00.005942  12986.92    12988.5 84  12986.28
2017-10-23 08:02:00 2017-10-23 08:02:00.901496  12987.6486486486    12988.5 98  12987
2017-10-23 08:03:00 2017-10-23 08:03:00.521976  12986.8148148148    12987.5 65  12986.1296296296
2017-10-23 08:04:00 2017-10-23 08:04:02.800922  12986.4705882353    12986.5 47  12985.5294117647
2017-10-23 08:05:00 2017-10-23 08:05:00.670865  12985.3658536585    12986   88  12984.7804878049
2017-10-23 08:06:00 2017-10-23 08:06:00.141393  12987.359375    12988   103 12986.734375
2017-10-23 08:07:00 2017-10-23 08:07:00.922107  12987.5454545455    12988   34  12986.7727272727
2017-10-23 08:08:00 2017-10-23 08:08:00.165103  12986.8214285714    12988   46  12986.0714285714
2017-10-23 08:09:00 2017-10-23 08:09:01.910121  12988.96875 12990   145 12988.328125
2017-10-23 08:10:00 2017-10-23 08:10:00.008064  12988.2678571429    12989.5 102 12987.6785714286
2017-10-23 08:11:00 2017-10-23 08:11:05.533862  12989.4318181818    12991   71  12988.8636363636
2017-10-23 08:12:00 2017-10-23 08:12:01.124564  12991.0444444444    12992.5 144 12990.4444444444
2017-10-23 08:13:00 2017-10-23 08:13:00.347987  12992.84375 12995   185 12992.0390625
2017-10-23 08:14:00 2017-10-23 08:14:00.627402  12994.2906976744    12996   216 12993.6395348837
2017-10-23 08:15:00 2017-10-23 08:15:00.032132  12994.8859649123    12996.5 211 12994.298245614

Tags: ofdfdatetimetimenpdtmeanmax
1条回答
网友
1楼 · 发布于 2024-05-26 22:56:10

一种可能的解决方案是floor之后agg

df_grouped[('time_of_day_10', 'first')] = df_grouped[('dateTime', 'first')].dt.floor('10min')
df_grouped[('time_of_day_30', 'first')] = df_grouped[('dateTime', 'first')].dt.floor('30min')

编辑:如果需要每个组的最大日期,请使用带有date的自定义函数:

groups = df.groupby('dateTime',as_index=True) 
df_grouped = (groups.agg({
                'dateTime':[lambda x: x.dt.date.max()],
                 'Clients1': [np.mean,np.max,],
                 'Clients2': [np.mean,np.max,],
                 }))

print (df_grouped.dtypes)
Clients1  mean        float64
          amax        float64
dateTime  <lambda>     object <-pure python date is object
Clients2  mean          int64
          amax          int64
dtype: object

或者,如果ned max datetime per group usefloorby dayd

df_grouped = (groups.agg({
                'dateTime':[lambda x: x.dt.floor('d').max()],
                 'Clients1': [np.mean,np.max,],
                 'Clients2': [np.mean,np.max,],
                 }))

print (df_grouped.dtypes)
Clients1  mean               float64
          amax               float64
dateTime  <lambda>    datetime64[ns] <- floor return pandas datetime
Clients2  mean                 int64
          amax                 int64
dtype: object

print (df_grouped)
                           Clients1             dateTime Clients2     
                               mean     amax    <lambda>     mean amax
dateTime                                                              
2017-10-23 08:00:04.854309  12991.5  12991.5  2017-10-23        2    2
2017-10-23 08:00:04.875162  12991.5  12991.5  2017-10-23        1    1
2017-10-23 08:00:04.876464  12989.5  12989.5  2017-10-23        1    1
2017-10-23 08:00:04.964356  12990.0  12990.0  2017-10-23        1    1
2017-10-23 08:00:04.968549  12990.5  12990.5  2017-10-23        1    1
2017-10-23 08:00:05.008758  12990.0  12990.0  2017-10-23        1    1
2017-10-23 08:00:05.996090  12990.0  12990.0  2017-10-23        2    2
2017-10-23 08:00:06.018212  12990.0  12990.0  2017-10-23        1    1
2017-10-23 08:00:06.344568  12989.5  12989.5  2017-10-23        1    1
2017-10-23 08:00:06.903661  12990.0  12990.0  2017-10-23        1    1
2017-10-23 08:00:07.120324  12990.0  12990.0  2017-10-23        1    1
2017-10-23 08:00:07.206179  12990.5  12990.5  2017-10-23        1    1
2017-10-23 08:00:07.358889  12991.5  12991.5  2017-10-23        3    3
2017-10-23 08:00:07.491244  12991.0  12991.0  2017-10-23        1    1
2017-10-23 08:00:07.671106  12991.0  12991.0  2017-10-23        1    1
2017-10-23 08:00:07.897968  12991.0  12991.0  2017-10-23        1    1
2017-10-23 08:00:08.028444  12991.0  12991.0  2017-10-23        1    1
2017-10-23 08:00:09.787827  12991.5  12991.5  2017-10-23        3    3
2017-10-23 08:00:10.178936  12991.5  12991.5  2017-10-23        3    3
2017-10-23 08:00:10.505921  12991.5  12991.5  2017-10-23        2    2
2017-10-23 08:00:11.438628  12992.0  12992.0  2017-10-23        1    1
2017-10-23 08:00:12.145907  12992.0  12992.0  2017-10-23        1    1

相关问题 更多 >

    热门问题