使用2列匹配多列的值

2024-06-16 11:29:23 发布

您现在位置:Python中文网/ 问答频道 /正文

样品测向:

ID   Name     Match1    Random_Col    Match2    Price    Match3     Match4       Match5
1    Apple      Yes     Random Value   No        10      Yes        Yes          Yes
2    Apple      Yes     Random Value1  No        10      Yes        Yes          No
3    Apple      Yes     Random Value2  No        15      No         Yes          Yes
4    Orange     No      Random Value   Yes       12      Yes        Yes          No
5    Orange     No      Random Value   Yes       12      No         No           No
6    Banana     Yes     Random Value   No        15      Yes        No           No
7    Apple      Yes     Random Value   No        15      No        Yes          Yes

预期DF:

ID   Name     Match1    Random_Col    Match2  Price Match3  Match4 Match5 Final_Match
1    Apple      Yes     Random Value   No      10    Yes    Yes    Yes   Full
2    Apple      Yes     Random Value1  No      10    Yes    Yes    No  Partial
3    Apple      Yes     Random Value2  No      15    No     Yes    Yes Partial
4    Orange     No      Random Value   Yes     12    Yes    Yes    No    Full
5    Orange     No      Random Value   Yes     12    No     No     No Partial
6    Banana     Yes     Random Value   No      15    Yes    No     No   Full
7    Apple      Yes     Random Value   No      15    No     Yes    Yes Partial

问题陈述:

  1. 如果组合NamePrice是非重复的,只需将Full放在Final_Match列中(示例ID 6)
  2. 如果组合NamePrice是重复的,那么在这些组合中,在Match1到Match5列中计算Yes,无论哪个列的“Yes”越大,则为其中一列放入Full,为另一列放入Partial(示例ID 1&;2和4,5)

  3. 如果组合NamePrice是重复的,那么在Match1到Match5列中的ID计数Yes内,如果它们具有相等的“Yes”,则将Partial放在这两个列中(示例ID 3,7)

代码

s = (df.replace({'Yes': 1, 'No': 0})
     .iloc[:, 1:]
     .sum(1))

df['final_match'] = np.where(s.groupby(df[['Price','Name']]).rank(ascending=False).eq(1), 'Full ','Partial')

当我只需要groupby一列时,上面的代码就可以工作了,比如说Name,但是它不能用于组合。你知道吗

任何帮助!!你知道吗


Tags: nonameid示例appledfvaluerandom
1条回答
网友
1楼 · 发布于 2024-06-16 11:29:23

用途:

#count Yes values only in Match columns
s = df.filter(like='Match').eq('Yes').sum(axis=1)
#mask for unique combinations
m1 = ~df.duplicated(['Price','Name'], keep=False)
#create new column filled by Series s
m2 = df.assign(new=s).groupby(['Price','Name'])['new'].rank(ascending=False).eq(1)
#chain masks by bitwise OR
df['final_match'] = np.where(m1 | m2, 'Full ','Partial')
print (df)

   ID    Name Match1     Random_Col Match2  Price Match3 Match4 Match5  \
0   1   Apple    Yes   Random Value     No     10    Yes    Yes    Yes   
1   2   Apple    Yes  Random Value1     No     10    Yes    Yes     No   
2   3   Apple    Yes  Random Value2     No     15     No    Yes    Yes   
3   4  Orange     No   Random Value    Yes     12    Yes    Yes     No   
4   5  Orange     No   Random Value    Yes     12     No     No     No   
5   6  Banana    Yes   Random Value     No     15    Yes     No     No   
6   7   Apple    Yes   Random Value     No     15     No    Yes    Yes   

  final_match  
0       Full   
1     Partial  
2     Partial  
3       Full   
4     Partial  
5       Full   
6     Partial  

相关问题 更多 >