可以基于nunique值删除数据帧中的行吗?

2024-06-02 05:53:53 发布

您现在位置:Python中文网/ 问答频道 /正文

我想忽略行,因为职业的唯一名称少于2个:

name        value      occupation
   a           23      mechanic
   a           24      mechanic
   b           30      mechanic
   c           40      mechanic
   c           41      mechanic
   d           30      doctor
   d           20      doctor
   e           70      plumber
   e           71      plumber
   f           30      plumber
   g           50      tailor

我做到了:

df.groupby('ocuupation')['name'].nunique()
>>>>>>
occupation
mechanic   3
doctor     1
plumber    2
tailor     1
Name: name, dtype: int64

可以使用类似df = df.drop(df[<some boolean condition>].index)的东西吗

期望输出:

name        value      occupation
   a           23      mechanic
   a           24      mechanic
   b           30      mechanic
   c           40      mechanic
   c           41      mechanic
   e           70      plumber
   e           71      plumber
   f           30      plumber

1条回答
网友
1楼 · 发布于 2024-06-02 05:53:53

使用^{}^{}来获取大于等于2的值:

df = df[df.groupby('occupation')['name'].transform('nunique').ge(2)]
print (df)
  name  value occupation
0    a     23   mechanic
1    a     24   mechanic
2    b     30   mechanic
3    c     40   mechanic
4    c     41   mechanic
7    e     70    plumber
8    e     71    plumber
9    f     30    plumber

您的解决方案是对^{}中比较的串联索引值进行过滤:

s = df.groupby('occupation')['name'].nunique()

df = df[df['occupation'].isin(s[s.ge(2)].index)]
print (df)
  name  value occupation
0    a     23   mechanic
1    a     24   mechanic
2    b     30   mechanic
3    c     40   mechanic
4    c     41   mechanic
7    e     70    plumber
8    e     71    plumber
9    f     30    plumber

相关问题 更多 >