比较两列中的值，并在pandas的第三列中输出结果

a_id b_received c_consumed 0 sam soap oil 1 sam oil NaN 2 sam brush soap 3 harry oil shoes 4 harry shoes oil 5 alice beer eggs 6 alice brush brush 7 alice eggs NaN

a_id b_received c_consumed output 0 sam soap oil 1 1 sam oil NaN 1 2 sam brush soap 0 3 harry oil shoes 1 4 harry shoes oil 1 5 alice beer eggs 0 6 alice brush brush 1 7 alice eggs NaN 1

a_id b_received c_consumed output 0 sam soap oil 1 1 sam oil NaN 1 2 sam brush soap 1 3 harry oil shoes 1 4 harry shoes oil 1 5 alice beer eggs 0 6 alice brush brush 1 7 alice eggs NaN 1

2条回答

网友

1楼 · 编辑于 2024-05-01 21:54:32

这应该是可行的，尽管理想的方法是JaminSore给出的方法

df['output'] = 0

ctr = 0

for names in df['a_id'].unique():
    for n, row in df.loc[df.a_id == names].iterrows():
        if row['b_received'] in df.loc[df.a_id == names]['c_consumed'].values:
            df.ix[ctr:]['output']=1
            ctr+=1
        else:
            df.ix[ctr:]['output']=0
            ctr+=1

数据帧现在正在

^{pr2}$

网友

2楼 · 编辑于 2024-05-01 21:54:32

键是pandas.Series.isin()，它检查传递给pandas.Series.isin()的对象中调用pandas.Series中每个元素的成员身份。您要使用c_consumed检查b_received中每个元素的成员身份，但只能在由a_id定义的每个组内。当将groupby与apply一起使用时，pandas将通过分组变量及其原始索引来索引对象。在您的例子中，您不需要索引中的分组变量，所以可以使用drop=True将索引重置回原来的状态。在

df['output'] = (df.groupby('a_id')
               .apply(lambda x : x['b_received'].isin(x['c_consumed']).astype('i4'))
               .reset_index(level='a_id', drop=True))

你的DataFrame现在是。。。在

^{pr2}$

请查看有关使用pandas的split-apply-combine的文档，以获得更彻底的解释。在

相关问题更多 >

编程相关推荐

热门问题

热门文章