如何检查没有紧跟关键字的单词,以及没有被关键字包围的单词?

2024-06-08 14:26:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图寻找那些不会立即出现在the前面的单词

执行正向查找以获取关键字“the”(?<=the\W)后面的单词。然而,我无法捕捉“人”和“那”,因为上述逻辑不适用于这些案例

我无法处理前后没有关键字“the”的单词(例如,句子中的“that”和“people”)

p = re.compile(r'(?<=the\W)\w+') 
m = p.findall('the part of the fair that attracts the most people is the fireworks')

print(m)

我得到的电流输出是

'part','fair','most','fireworks'. 

编辑:

感谢您在下面提供的所有帮助。在评论中使用以下建议,成功地更新了我的代码

p = re.compile(r"\b(?!the)(\w+)(\W\w+\Wthe)?")
m = p.findall('the part of the fair that attracts the most people is the fireworks')

这使我更接近我需要得到的输出

更新输出:

[('part', ' of the'), ('fair', ''),
 ('that', ' attracts the'), ('most', ''),
 ('people', ' is the'), ('fireworks', '')]

我只需要字符串(‘部分’、‘公平’、‘那个’、‘大多数’、‘人’、‘焰火’)。 有什么建议吗


Tags: oftheremostthatfairis关键字
3条回答

使用正则表达式:

import re
m = re.sub(r'\b(\w+)\b the', 'the', 'the part of the fair that attracts the most people is the fireworks')
print([word for word in m.split(' ') if not word.isspace() and word])

输出:

['the', 'part', 'the', 'fair', 'that', 'the', 'most', 'people', 'the', 'fireworks']

I am trying to look for words that do not immediately come before 'the' .

请注意,下面的代码不使用re

words = 'the part of the fair that attracts the most people is the fireworks'
words_list = words.split()
words_not_before_the = []
for idx, w in enumerate(words_list):
    if idx < len(words_list)-1 and words_list[idx + 1] != 'the':
        words_not_before_the.append(w)
words_not_before_the.append(words_list[-1])
print(words_not_before_the)

输出

['the', 'part', 'the', 'fair', 'that', 'the', 'most', 'people', 'the', 'fireworks']

I am trying to look for words that do not immediately come before the.

试试这个:

import re

# The capture group (\w+) matches a word, that is followed by a word, followed by the word: "the"
p = re.compile(r'(\w+)\W\w+\Wthe')
m = p.findall('the part of the fair that attracts the most people is the fireworks')
print(m)

输出:

['part', 'that', 'people']

相关问题 更多 >