python来自2个groupby列的聚合计数的最大值

import pandas as pd hours = [8,8,9,9, 8,9,10,10, 8,9,12,12, 10,11,12,12] names = ['A', 'B', 'C', 'D'] * 4 df = pd.DataFrame({'names': names, 'hours', hours})

# This will get me the aggregated count based on names and hours df.groupby(['names', 'hours']).size().reset_index(name='count') # result names hours count A 8 3 10 1 B 8 1 9 2 11 1 C 9 1 10 1 12 2 D 9 1 10 1 12 2 # To get the max value for each names & hours group (But failed) df.groupby(['names', 'hours']).size().reset_index(name='count').\ groupby(['names','hours']).max() # I get the same result as I got above

3条回答

网友

1楼 · 编辑于 2024-04-23 18:32:53

使用groupby和value\u计数的另一种方法：

(
    df.groupby('names')
    .apply(lambda x: x.hours.value_counts().nlargest(1))    
    .reset_index()
    .set_axis(['names','hours','count'], axis=1, inplace=False)
)

Out[249]: 
  names  hours  count
0     A      8      3
1     B      9      2
2     C     12      2
3     D     12      2

网友

2楼 · 编辑于 2024-04-23 18:32:53

df.groupby(['names', 'hours']) \
  .size() \
  .reset_index(name='count') \
  .sort_values('count', ascending=False) \
  .drop_duplicates(['names'])

网友

3楼 · 编辑于 2024-04-23 18:32:53

这个怎么样：

grouped = df.groupby(['names', 'hours']).size().reset_index(name='count')
final = df.loc[df.groupby(['names'])['count'].transform(max) == df['count']]
final
#names  hours  count
#A      8      3
#B      9      2
#C      12     2
#D      12     2

相关问题更多 >

编程相关推荐

热门问题

热门文章