打印给定字符串中两个特定单词之间的单词

2024-04-20 03:25:09 发布

您现在位置:Python中文网/ 问答频道 /正文

如果一个单词没有以另一个单词结尾,那么就离开它。这是我的绳子:

x = 'john got shot dead. john with his .... ? , john got killed or died in 1990. john with his wife dead or died'

我想打印并计算johndead or death or died.之间的所有单词 如果john不以died or dead or death个单词结尾。别管它。从约翰·沃德开始。在

我的代码:

^{pr2}$

我的输出:

 got shot 
2
 with his          john got killed or 
6
 with his wife 
3

我想要的输出:

got shot
2
got killed or
3
with his wife
3

我不知道我在哪里做错了。 它只是一个示例输入。我一次要检查20000个输入。在


Tags: orinwith结尾john单词gotshot
2条回答

我假设,如果在dead|died|death出现之前,字符串中还有另一个john,您需要重新开始。在

然后,可以用单词john分割字符串,然后开始匹配结果部分:

x = 'john got shot dead. john with his .... ? , john got killed or died in 1990. john with his wife dead or died'
x = re.sub('\W+', ' ', re.sub('[^\w ]', '', x)).strip()
for e in x.split('john'):
    m = re.match('(.+?)(dead|died|death)', e)
    if m:
        print(m.group(1))
        print(len(m.group(1).split()))

产量:

^{pr2}$

另外,请注意,在我建议的替换之后(在拆分和匹配之前),字符串如下所示:

john got shot dead john with his john got killed or died in 1990 john with his wife dead or died

也就是说,在一个序列中没有多个空白。你可以用一个空格来处理这个问题,但是我觉得这样比较干净。在

您可以使用此否定的lookahead regex:

>>> for i in re.findall(r'(?<=john)(?:(?!john).)*?(?=dead|died|death)', x):
...     print i.strip()
...     print len([word for word in i.split()])
...

got shot
2
got killed or
3
with his wife
3

此正则表达式使用的不是.*?,而是使用(?:(?!john).)*?,它只在匹配中不存在john时,才懒洋洋地匹配0个或多个任何字符。在

我还建议使用单词边界来匹配完整的单词:

^{pr2}$

Code Demo

相关问题 更多 >