按一列分组，并找出另一列中唯一值的数量

date hour staff 0 2019-10-01 6 A 1 2019-10-01 6 B 2 2019-10-01 6 C 3 2019-10-02 6 D 4 2019-10-02 6 B 5 2019-10-02 6 A 6 2019-10-03 6 B 7 2019-10-03 6 B 8 2019-10-03 6 B 9 2019-10-01 7 D 10 2019-10-01 7 A 11 2019-10-01 7 B 12 2019-10-01 7 C 13 2019-10-02 7 D 14 2019-10-02 7 C 15 2019-10-02 7 A 16 2019-10-03 7 B 17 2019-10-03 7 B 18 2019-10-03 7 A

2条回答

网友

1楼 · 编辑于 2024-05-29 06:00:30

df.groupby(['hour', 'date'])['staff'].nunique().reset_index()\
  .groupby('hour')['staff'].mean().round()

>>> output

6   2.0
7   3.0

编辑：

ankyƏu 91在评论中的解决方案要快得多，应该明确使用：

df.groupby(['date','hour'])['staff'].nunique().mean(level=1).round()

网友

2楼 · 编辑于 2024-05-29 06:00:30

我没有足够的声誉发表评论-在第一个解决方案中第二次包含['staff']是虚假的。把reset\u index（）放在末尾也稍微好一点。你知道吗

df.groupby(['date','hour'])['staff'].nunique().groupby('hour').mean().round().reset_index()

使用agg的替代语法：

df.groupby(['date','hour']).agg(lambda x: x.nunique()).groupby('hour').mean().round() \
.reset_index()

如果您真的希望结果是int，可以用astype(int)替换mean()：

df.groupby(['date','hour'])['staff'].nunique().mean(level=1).astype(int).reset_index()

相关问题更多 >

编程相关推荐

热门问题

热门文章