在一段文字中间匹配句子,直到我点击“Hello World”?

2024-04-28 22:47:06 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我有这个文本块,但是我想匹配HELLO WORLD之前的文本。什么正则表达式是合适的?你知道吗

我用过:Te pri\.[?=HELLO WORLD]但什么都没用。你知道吗

Lorem ipsum dolor sit amet, timeam evertitur ex eos, utamur temporibus disputationi eum te. 
Te pri dicant exerci nonumy, in case erat albucius mei.  
Pertinax periculis concludaturque eum te, et nam vero nominavi deterruisset. HELLO WORLD. 
Ex augue scriptorem pri. Vocent minimum quaerendum duo eu, habemus adipiscing ex eum.

请记住,我对正则表达式还比较陌生。你知道吗


Tags: 文本helloworldexpriteipsumlorem
3条回答

您要查找的是所有出现一次或多次的字符.。你知道吗

并且您希望确保之后出现另一个模式,而不将其包含在匹配中,也称为“正向前瞻”(?=)。你知道吗

.+(?=HELLO WORLD)

Demo 1

如果要匹配换行符,只需使用s标志/修饰符来扩展.的含义。你知道吗

Demo 2

您需要以下正则表达式:

(?s)(Te pri.*?)HELLO WORLD

分解一下,这个短语的意思是:

(?s)     Make the '.' regex metacharacter match newlines too
(        Start a capturing group
Te pri   Match exactly 'Te pri'
.        The dot metacharacter matches any character except newlines
*        Match the prior metacharacter, character class or group zero or more times
         By default will match as many times as possible
?        When paired with '*', it makes '*' match as few times as possible
         This way, '.*' doesn't match 'HELLO WORLD'
)        End the capturing group

使用.group()例如访问组中捕获的内容

import re
regex = re.compile(r"(?s)(Te pri.*?)HELLO WORLD")
m = regex.match(your_text)
m.group(1)

快乐的编码!你知道吗

使用以下内容:你知道吗

import re

text = '''Lorem ipsum dolor sit amet, timeam evertitur ex eos, utamur temporibus disputationi eum te. 
Te pri dicant exerci nonumy, in case erat albucius mei.  
Pertinax periculis concludaturque eum te, et nam vero nominavi deterruisset. HELLO WORLD. 
Ex augue scriptorem pri. Vocent minimum quaerendum duo eu, habemus adipiscing ex eu'''


try:
    foundSubString = re.search('(?s)(Te\spri\sdicant.*?)HELLO WORLD', text).group(1)
except AttributeError:
    foundSubString = '' # apply your error handling

print 'Match Found:',foundSubString

相关问题 更多 >