当一行包含另一行的字符串时,如何匹配行?

2024-04-19 16:22:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我的目标是找到与列general_text中的行匹配的City,但匹配必须是精确的。你知道吗

我尝试使用搜索IN,但是它没有给我预期的结果,所以我尝试使用str.contain,但是我尝试的方式显示了一个错误。关于如何正确或有效地做这件事有什么提示吗?你知道吗

我尝试了基于Filtering out rows that have a string field contained in one of the rows of another column of strings的代码

df['matched'] = df.apply(lambda x: x.City in x.general_text, axis=1)

但结果如下:

data = [['palm springs john smith':'spring'],
    ['palm springs john smith':'palm springs'],
    ['palm springs john smith':'smith'],
    ['hamptons amagansett':'amagansett'],
    ['hamptons amagansett':'hampton'],
    ['hamptons amagansett':'gans'],
    ['edward riverwoods lake':'wood'],
    ['edward riverwoods lake':'riverwoods']]

df = pd.DataFrame(data, columns = [ 'general_text':'City'])

df['match'] = df.apply(lambda x: x['general_text'].str.contain(
                                          x.['City']), axis = 1)

我希望上面的代码只匹配以下内容:

data = [['palm springs john smith':'palm springs'],
    ['hamptons amagansett':'amagansett'],
    ['edward riverwoods lake':'riverwoods']]

Tags: oftextcitydfdatajohngeneralsmith
1条回答
网友
1楼 · 发布于 2024-04-19 16:22:45

可以使用单词边界\b\b进行精确匹配:

import re

f = lambda x: bool(re.search(r'\b{}\b'.format(x['City']), x['general_text']))

或:

f = lambda x: bool(re.findall(r'\b{}\b'.format(x['City']), x['general_text']))

df['match'] = df.apply(f, axis = 1)
print (df)
              general_text          City  match
0  palm springs john smith        spring  False
1  palm springs john smith  palm springs   True
2  palm springs john smith         smith   True
3      hamptons amagansett    amagansett   True
4      hamptons amagansett       hampton  False
5      hamptons amagansett          gans  False
6   edward riverwoods lake          wood  False
7   edward riverwoods lake    riverwoods   True

相关问题 更多 >