在python中删除子字符串时标识字符串

2024-04-25 21:46:16 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一本词频如下的词典。你知道吗

mydictionary = {'yummy tim tam':3, 'milk':2, 'chocolates':5, 'biscuit pudding':3, 'sugar':2}

我有一组字符串(去掉标点符号),如下所示。你知道吗

recipes_book = "For todays lesson we will show you how to make biscuit pudding using 
yummy tim tam milk and rawsugar"

在上面的字符串中,我只需要通过引用字典输出“biscuit pudding”、“yummy tim tam”和“milk”。不是糖,因为绳子里有生糖。你知道吗

但是,我目前使用的代码也输出了sugar。你知道吗

mydictionary = {'yummy tim tam':3, 'milk':2, 'chocolates':5, 'biscuit pudding':3, 'sugar':2}
recipes_book = "For today's lesson we will show you how to make biscuit pudding using yummy tim tam milk and rawsugar"
searcher = re.compile(r'{}'.format("|".join(mydictionary.keys())), flags=re.I | re.S)

for match in searcher.findall(recipes_book):
    print(match)

如何避免使用这样的子字符串,而只考虑一个完整的标记,如“milk”。请帮帮我。你知道吗


Tags: 字符串reforsugartimrecipesmilkbook
3条回答

使用re.escape的另一种方法。 有关re.escape here的更多信息!!!你知道吗

import re

mydictionary = {'yummy tim tam':3, 'milk':2, 'chocolates':5, 'biscuit pudding':3, 'sugar':2}
recipes_book = "For today's lesson we will show you how to make biscuit pudding using yummy tim tam milk and rawsugar"

val_list = []

for i in mydictionary.keys():
    tmp_list = []
    regex_tmp = r'\b'+re.escape(str(i))+r'\b'
    tmp_list = re.findall(regex_tmp,recipes_book)
    val_list.extend(tmp_list)

print val_list

输出:

"C:\Program Files (x86)\Python27\python.exe" C:/Users/punddin/PycharmProjects/demo/demo.py
['yummy tim tam', 'biscuit pudding', 'milk']

您可以使用regex单词边界更新代码:

mydictionary = {'yummy tim tam':3, 'milk':2, 'chocolates':5, 'biscuit pudding':3, 'sugar':2}
recipes_book = "For today's lesson we will show you how to make biscuit pudding using yummy tim tam milk and rawsugar"
searcher = re.compile(r'{}'.format("|".join(map(lambda x: r'\b{}\b'.format(x), mydictionary.keys()))), flags=re.I | re.S)

for match in searcher.findall(recipes_book):
    print(match)

输出:

biscuit pudding
yummy tim tam
milk

使用单词边界'\b'。简单地说

recipes_book = "For todays lesson we will show you how to make biscuit pudding using 
yummy tim tam milk and rawsugar"

>>> re.findall(r'(?is)(\bchocolates\b|\bbiscuit pudding\b|\bsugar\b|\byummy tim tam\b|\bmilk\b)',recipes_book)
['biscuit pudding', 'yummy tim tam', 'milk']

相关问题 更多 >