查找两列之间缺少的单词

2024-06-06 02:51:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个包含组件的列,我想比较新列是否缺少单词或与旧列不同

列1

Index     Old
0         Caramel Color, Color, Citric Acid, Treated Water, Caffeine, Flavour Enhancer
1         Natural Extracts, Glycol, Ethanol,

Col 2

Index     New
0         Caramel Color, Color, Citric Acid, Water, Flavour Reducer
1         Glycol, Ethanol

我已经尝试过这个解决方案,但它似乎不能正常工作

L = df['old']
values_not_in_array = df[~df.old.isin(L)].old
values_in_array = df[df.old(L)].old

创建缺少值或与不在新列行中的旧列不同的列的最佳解决方案是什么


Tags: indfindex解决方案arrayoldcolorvalues
2条回答

将拆分后的值转换为集合,然后进行减法,如有必要,最后连接到字符串:

df['diff'] = [', '.join(set(o.split(', ')) - set(n.split(', '))) 
                                                          for o, n in zip(df.Old, df.New)]
print (df)
                                                 Old  \
0  Caramel Color, Color, Citric Acid, Treated Wat...   
1                  Natural Extracts, Glycol, Ethanol   

                                                 New  \
0  Caramel Color, Color, Citric Acid, Water, Flav...   
1                                    Glycol, Ethanol   

                                       diff  
0  Treated Water, Flavour Enhance, Caffeine  
1                          Natural Extracts  

df['miss'] = [', '.join(set(n.split(', ')) - set(o.split(', '))) 
                                                           for o, n in zip(df.Old, df.New)]
print (df)
                                                 Old  \
0  Caramel Color, Color, Citric Acid, Treated Wat...   
1                  Natural Extracts, Glycol, Ethanol   

                                                 New                    miss  
0  Caramel Color, Color, Citric Acid, Water, Flav...  Water, Flavour Reducer  
1                                    Glycol, Ethanol                          

这可以通过应用一种方法来实现,该方法将列值转换为单词列表,然后找到差异并将其保存在新列中:

import pandas as pd

dic = {
    'Old': ['Caramel Color, Color, Citric Acid, Treated Water, Caffeine, Flavour Enhancer', 'Natural Extracts, Glycol, Ethanol,'],
    'New': ['Caramel Color, Color, Citric Acid, Water, Flavour Reducer', 'Glycol, Ethanol'],
}

df = pd.DataFrame(dic)

print(df)

df['MissingWord'] = df.apply(lambda x: ', '.join(list(set(x['Old'].split(',')) - set(x['New'].split(',')))), axis=1)

print(df['MissingWord'])

输出:

0     Treated Water,  Caffeine,  Flavour Enhancer
1                      Glycol, , Natural Extracts
Name: MissingWord, dtype: object

相关问题 更多 >