如何比较pandas中的两个字符串变量？

test = pd.DataFrame({'A': ["john doe", " john doe", 'John'], 'B': [' john doe', 'eddie murphy', 'batman']}) test Out[6]: A B 0 john doe john doe 1 john doe eddie murphy 2 John batman test['A'].isin(test['B']) Out[7]: 0 False 1 True 2 False Name: A, dtype: bool

3条回答

网友

1楼 · 编辑于 2024-04-23 18:55:51

strip空格和lower案例：

In [414]:
test['A'].str.strip().str.lower() == test['B'].str.strip().str.lower()

Out[414]:
0     True
1    False
2    False
dtype: bool

网友

2楼 · 编辑于 2024-04-23 18:55:51

可以使用difflib计算距离

import difflib as dfl
dfl.SequenceMatcher(None,'John Doe', 'John doe').ratio()

编辑：与熊猫的集成：

import pandas as pd
import difflib as dfl
df = pd.DataFrame({'A': ["john doe", " john doe", 'John'], 'B': [' john doe', 'eddie murphy', 'batman']})
df['VAR1'] = df.apply(lambda x : dfl.SequenceMatcher(None, x['A'], x['B']).ratio(),axis=1)

网友

3楼 · 编辑于 2024-04-23 18:55:51

我想你可以用^{}和^{}来表示任意空格s/+：

test = pd.DataFrame({'A': ["john  doe", " john doe", 'John'], 
                     'B': [' john doe', 'eddie murphy', 'batman']})

print test['A'].str.lower().str.replace('s/+',"") == 
      test['B'].str.strip().str.replace('s/+',"")


0     True
1    False
2    False
dtype: bool

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何比较pandas中的两个字符串变量？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >