复杂正则表达式的解释

{"Timestamp": "Tue Apr 07 00:32:29 EDT 2015",Title: Indian Herald: India's Latest News, Business, Sport, Weather, Travel, Technology, Entertainment, Politics, Finance Product: Gecko CPUs: 8 Language: en-GB"}

3条回答

网友

1楼 · 编辑于 2024-05-13 04:29:43

你不需要这么复杂的正则表达式来获得标题。使用

Title:\s*(.*?)(?=\s*<br/?>)

见demo

我们将Title:，然后将空格\s*，然后将tp  上的任何字符与(.*?)(?=\s*<br/?>)匹配。你知道吗

至于(?:(?! ).)+，这意味着捕获一个或多个不跟在 后面的字符。有一个SO post where this construction is explained in detail。你知道吗

这里是来自regex101（转到Regex Debugger选项卡，然后单击右侧的+）的图像，显示了该构造正在执行的操作（检查下一个字符是否是 ，如果不是，则消耗和回溯，等等）：

enter image description here

关于正则表达式中有多少捕获组的问题，Title: ((?:(?! ).)+)有1个捕获组（((?:(?! ).)+)）和1个非捕获组（(?:(?! ).)）。你知道吗

网友

2楼 · 编辑于 2024-05-13 04:29:43

((?:(?! ).)+)的意思是：

((?:(?!<br>).)+)
^... Match the regex and capture its match into backreference 1

((?:(?!<br>).)+)
 ^... Match the regex (non capturing group)

((?:(?!<br>).)+)
    ^... Assert that it is not possible to match the regex <br>

((?:(?!<br>).)+)
            ^... Match a single character, that is not a line break character 

((?:(?!<br>).)+)
              ^... Between one and unlimmited times

网友

3楼 · 编辑于 2024-05-13 04:29:43

首先，在这里你不需要向前看。您可以使用这个简单的正则表达式来完成所做的工作：

>>> re.search(r'Title: *(.+?) *<br>', message).group(1)
"Indian Herald: India's Latest News, Business, Sport, Weather, Travel, Technology, Entertainment, Politics, Finance"

顺便说一下你的正则表达式：

Title: ((?:(?!<br>).)+)

正在使用negative lookahead(?! )在匹配文本Title:后的字符之前检查 是否存在。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章