用Python进行枚举和分组

2024-04-29 04:48:59 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下数据帧(包含更多列)

ID  CAT ACTION  D1  
1   PRE opened  111 
1   PRE phone   111 
1   PRE opened  111 
1   PRE phone   111 
2   POS opened  222 
2   POS navi    222 
1   PRE opened  111 
1   PRE phone   111 
3   POS opened  111 
3   POS navi    222 
3   POS phone   222 
3   POS navi    222 
2   POS opened  222 
2   POS phone   222 
5   PRE opened  111 
6   PRE opened  111 
6   PRE opened  111 
6   PRE phone   111 
6   PRE navi    111 

这个表有活动的寄存器,我想为每个打开的操作标记操作,如下所示

ID  CAT ACTION  D1  flg
1   PRE opened  111 1
1   PRE phone   111 1
1   PRE opened  111 2
1   PRE phone   111 2
2   POS opened  222 3
2   POS navi    222 3
1   PRE opened  111 4
1   PRE phone   111 4
3   POS opened  111 5
3   POS navi    222 5
3   POS phone   222 5
3   POS navi    222 5
2   POS opened  222 6
2   POS phone   222 6
5   PRE opened  111 7
6   PRE opened  111 8
6   PRE opened  111 9
6   PRE phone   111 9
6   PRE navi    111 9

我使用这个代码,但它需要很长时间才能完成

i=0
for y in range(1, len(df)):
    if(df.loc[y,'ACTION']=='opened'):
        i+=1
        df.loc[y,'new_column']=i
    else:
        df.loc[y,'new_column']=i

最终的目标是有这样的东西,我假装使用pivot

flg ID  CAT D1  Int   ACTION
1   1   PRE 111 phone  opened-phone
2   1   PRE 111 phone  opened-phone
3   2   POS 222 navi   opened-navi
4   1   PRE 111 phone  opened-phone
5   3   POS 222 phone  opened-navi-phone-navi
6   2   POS 222 phone  opened-phone
7   5   PRE 111 opened opened
8   6   PRE 111 opened opened
9   6   PRE 111 phone  opened-phone-navi

Tags: 数据posiddfnewphonecolumnaction
1条回答
网友
1楼 · 发布于 2024-04-29 04:48:59

可以将cumsum与Boolean一起使用

df['flag']=df.ACTION.eq('opened').cumsum()
df1=df.groupby(['flag','ID','CAT']).agg({'ACTION':['last',lambda x : x.str.cat(sep='-')],'D1':'last'})# you can rename it by df1.columns=['Your columns name']
df1
Out[358]: 
             ACTION                           D1
               last                <lambda> last
flag ID CAT                                     
1    1  PRE   phone            opened-phone  111
2    1  PRE   phone            opened-phone  111
3    2  POS    navi             opened-navi  222
4    1  PRE   phone            opened-phone  111
5    3  POS    navi  opened-navi-phone-navi  222
6    2  POS   phone            opened-phone  222
7    5  PRE  opened                  opened  111
8    6  PRE  opened                  opened  111
9    6  PRE    navi       opened-phone-navi  111

更多信息

df.ACTION.eq('opened').cumsum()
Out[352]: 
0     1
1     1
2     2
3     2
4     3
5     3
6     4
7     4
8     5
9     5
10    5
11    5
12    6
13    6
14    7
15    8
16    9
17    9
18    9
Name: ACTION, dtype: int32

相关问题 更多 >