根据其他三列的多数值设置pandas dataframe winner列值

2024-06-16 10:37:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我这有熊猫df

id  Vote1     Vote2      Vote3 
123 Positive  Negative   Positive
223 Positive  Negative   Neutral 
323 Positive  Negative   Negative  
423 Positive  Positive             

我想添加另一个名为winner的列 这将被设置为任何多数票,如果票数相等,则将设置第一票,如id=223所示

所以结果df应该是

^{pr2}$

这可能与 Update Pandas Cells based on Column Values and Other Columns


Tags: idpandasdfupdatebasedcellsnegativepositive
3条回答

我编写了一个函数并将其应用于df。它通常比普通的循环快一点。在

import pandas as pd
import numpy as np

def vote(row):
    pos = np.sum(row.values == 'Positive')
    neg = np.sum(row.values == 'Negative')
    if pos > neg:
        return('Positive')
    elif pos < neg: 
        return('Negative')
    else:
        return(row['Vote1'])

# Create the dataframe
df = pd.DataFrame()
df['id']=[123,223,323,423]
df['Vote1']=['Positive']*4
df['Vote2']=['Negative']*3+['Positive']
df['Vote3']=['Positive','Neutral','Negative','']
df = df.set_index('id')
df['Winner'] = df.apply(vote,axis=1)

结果

^{pr2}$

另一个,熊猫解决方案没有循环:

df = df.set_index('id')
rep = {'Positive':1,'Negative':-1,'Neutral':0}
df1 = df.replace(rep)

df = df.assign(Winner=np.where(df1.sum(axis=1) > 0,'Positive',np.where(df1.sum(axis=1) < 0, 'Negative', df.iloc[:,0])))
print(df)

输出:

^{pr2}$

解释

df.assign是在原始数据帧的副本中创建列的一种方法,因此必须重新分配回df。列的名称是Winner,因此是“winner=”。在

接下来,使用^{}嵌套if语句。。。np.哪里(条件、结果、其他)

np.where(df.sum(axis=1) > 0,  # this sums the dataframe by row
         'Positive',  #if true
         np.where(df.sum(axis=1) < 0, #nested if the first if return false  
                  'Negative', #sum of the row is less than 0
                  df.iloc[:,0] #sum = 0 get the first value from that row.
                  )
         )

你可以这样排:

import pandas as pd
import numpy as np

# Create the dataframe
df = pd.DataFrame()
df['id']=[123,223,323,423]
df['Vote1']=['Positive']*4
df['Vote2']=['Negative']*3+['Positive']
df['Vote3']=['Positive','Neutral','Negative','']

mostCommonVote=[]
for row in df[['Vote1','Vote2','Vote3']].values:
    votes, values = np.unique(row, return_counts=True)
    if np.all(values<=1):
            mostCommonVote.append( row[0] )
    else:
        mostCommonVote.append( votes[np.argmax(values)] )

df['Winner'] = mostCommonVote

结果:

^{pr2}$

它可能不是最优雅的解决方案,但它相当简单。它使用numpy函数unique,该函数可以返回行中每个唯一字符串的计数。在

相关问题 更多 >