按groupby在两列中列出的唯一值计数

import pandas as pd d = ({ 'B' : ['08:00:00','John','08:10:00','Gary','08:41:42','John','08:50:00','John', '09:00:00', 'Gary','09:15:00','John','09:21:00','Gary','09:30:00','Gary','09:40:00','Gary'], 'C' : ['1','1','1','1','1','1','2','2','2', '2','2','2','3','3','3', '3','3','3'], 'A' : ['Stop','','Res','','Start','','Stop','','Res','','Start','','Stop','','Res','','Start',''] }) df = pd.DataFrame(data=d)

2条回答

网友

1楼 · 编辑于 2024-05-13 17:48:04

我们可以创建一个扁平的DF：

In [34]: d = pd.DataFrame(np.column_stack((df.iloc[::2], df.iloc[1::2, [0]])), columns=['time','id','op','name'])

In [35]: d
Out[35]:
       time id     op  name
0  08:00:00  1   Stop  John
1  08:10:00  1    Res  Gary
2  08:41:42  1  Start  John
3  08:50:00  2   Stop  John
4  09:00:00  2    Res  Gary
5  09:15:00  2  Start  John
6  09:21:00  3   Stop  Gary
7  09:30:00  3    Res  Gary
8  09:40:00  3  Start  Gary

一个多索引将包括：

^{pr2}$

按两列分组：

In [39]: res = d.groupby(['name','op'])['id'].count().reindex(idx, fill_value=0)

In [40]: res
Out[40]:
John  Stop     2
      Res      0
      Start    2
Gary  Stop     1
      Res      3
      Start    1
Name: id, dtype: int64

网友

2楼 · 编辑于 2024-05-13 17:48:04

这是一个奇怪的数据帧，强烈建议不要在同一列中包含时间和名称。再加一列就行了！这会让事情变得简单。在

鉴于您的数据，如果您不介意从John中丢失RES：

df[df==''] = None
df = df.fillna(method='ffill')
df[df['B'].isin(['Gary', 'John'])].groupby(['B', 'A']).C.nunique()

相关问题更多 >

编程相关推荐

热门问题

热门文章