当一行包含另一行的字符串时，如何匹配行？

data = [['palm springs john smith':'spring'], ['palm springs john smith':'palm springs'], ['palm springs john smith':'smith'], ['hamptons amagansett':'amagansett'], ['hamptons amagansett':'hampton'], ['hamptons amagansett':'gans'], ['edward riverwoods lake':'wood'], ['edward riverwoods lake':'riverwoods']] df = pd.DataFrame(data, columns = [ 'general_text':'City']) df['match'] = df.apply(lambda x: x['general_text'].str.contain( x.['City']), axis = 1)

1条回答

网友

1楼 · 发布于 2024-04-19 16:22:45

可以使用单词边界\b\b进行精确匹配：

import re

f = lambda x: bool(re.search(r'\b{}\b'.format(x['City']), x['general_text']))

或：

f = lambda x: bool(re.findall(r'\b{}\b'.format(x['City']), x['general_text']))

df['match'] = df.apply(f, axis = 1)
print (df)
              general_text          City  match
0  palm springs john smith        spring  False
1  palm springs john smith  palm springs   True
2  palm springs john smith         smith   True
3      hamptons amagansett    amagansett   True
4      hamptons amagansett       hampton  False
5      hamptons amagansett          gans  False
6   edward riverwoods lake          wood  False
7   edward riverwoods lake    riverwoods   True

相关问题更多 >

编程相关推荐

热门问题

热门文章