如何在条件下分组/标记到同一组

2024-06-16 11:16:13 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个如下所示的数据框。此数据帧包含名称和地址信息

l_name   f_name   m_name    city_state     zip 
smith     sam                auburn,wa    98910
smith     sam       c        auburn,wa    98910
smith     sam                durham,nc    27659
smith     sam     eddie      auburn,wa    98910
smith     sam                auburn,wa    98910
  :
  :
aaron     joe     chris      ocala,fl     58910
aaron     joe                ocala,fl     58910
aaron     joe                durham,nc    27659
aaron     joe                ocala,fl     58910
aaron     joe       a        ocala,fl     58910

我正在尝试创建一个代码,如果它们的l_name、f_name、city_state和zip列都匹配,则可以将它们分组/标记到同一个组中

下面是我希望看到的输出数据帧

l_name   f_name   m_name    city_state     zip       tag
smith     sam                auburn,wa    98910       1
smith     sam       c        auburn,wa    98910       1
smith     sam                durham,nc    27659       0
smith     sam     eddie      auburn,wa    98910       1
smith     sam                auburn,wa    98910       1
clair     may                seattle,wa   98092       2
clair     may       a        seattle,wa   98092       2
clair     may                seattle,wa   98092       2
  :
  :
aaron     joe     chris      ocala,fl     58910       n
aaron     joe                ocala,fl     58910       n
aaron     joe                durham,nc    27659       0
aaron     joe                ocala,fl     58910       n
aaron     joe       a        ocala,fl     58910       n

如果此人不属于任何组,它将为其分配一个0。4 sam smith符合条件,因此他们都属于同一组,即1。因此,标记/组2将是may clair,而joe aaron将是n组,这取决于数据帧中存储了多少个名称

我想知道有没有什么好的方法、方法或建议可以让我创建一个可以输出我想要的结果的代码

谢谢


Tags: 数据namecitysammaysmithncjoe
2条回答

您可以使用>;1计数并获得组号(和+1),然后对于计数为1的组,将标记设置为0。然后连接

df1 = df[df.groupby(['l_name', 'f_name', 'city_state', 'zip'])['m_name'].transform('count')>1].copy()
df1['tag'] = df1.groupby(['l_name', 'f_name', 'city_state', 'zip']).ngroup() + 1

df2 = df[df.groupby(['l_name', 'f_name', 'city_state', 'zip'])['m_name'].transform('count')==1].copy()
df2['tag'] = 0

df_final = pd.concat([df1,df2])
df_final

  l_name f_name m_name city_state    zip  tag
0  smith    sam      x  auburn,wa  98910    2
1  smith    sam      c  auburn,wa  98910    2
3  smith    sam  eddie  auburn,wa  98910    2
4  smith    sam      x  auburn,wa  98910    2
5  aaron    joe  chris   ocala,fl  58910    1
6  aaron    joe      x   ocala,fl  58910    1
8  aaron    joe      x   ocala,fl  58910    1
9  aaron    joe      a   ocala,fl  58910    1
2  smith    sam      x  durham,nc  27659    0
7  aaron    joe      x  durham,nc  27659    0

您可以使用ngroup的组合,然后计算其值:

# create a ID variable with a unique identifier for each group
groupvar = df.groupby(['l_name', 'f_name', 'city_state', 'zip']).ngroup(ascending=False) + 1
# assign it to the data 
df = df.assign(tag=groupvar)
df
    l_name  f_name  m_name  city_state  zip     tag
0   smith   sam             auburn,wa   98910   2
1   smith   sam     c       auburn,wa   98910   2
2   smith   sam             durham,nc   27659   1
3   smith   sam     eddie   auburn,wa   98910   2
4   smith   sam             auburn,wa   98910   2
5   clair   may             seattle,wa  98092   3
6   clair   may     a       seattle,wa  98092   3
7   clair   may             seattle,wa  98092   3
8   aaron   joe     chris   ocala,fl    58910   4
9   aaron   joe             ocala,fl    58910   4
10  aaron   joe             durham,nc   27659   5
11  aaron   joe             ocala,fl    58910   4
12  aaron   joe     a       ocala,fl    58910   4

# determine which IDs occur only once and assign 0 to them
df.loc[df['tag'].value_counts()[df['tag']].values == 1, 'tag'] = 0
df
    l_name  f_name  m_name  city_state  zip     tag
0   smith   sam             auburn,wa   98910   2
1   smith   sam     c       auburn,wa   98910   2
2   smith   sam             durham,nc   27659   0
3   smith   sam     eddie   auburn,wa   98910   2
4   smith   sam             auburn,wa   98910   2
5   clair   may             seattle,wa  98092   3
6   clair   may     a       seattle,wa  98092   3
7   clair   may             seattle,wa  98092   3
8   aaron   joe     chris   ocala,fl    58910   4
9   aaron   joe             ocala,fl    58910   4
10  aaron   joe             durham,nc   27659   0
11  aaron   joe             ocala,fl    58910   4
12  aaron   joe     a       ocala,fl    58910   4

相关问题 更多 >