要多次匹配的Python正则表达式

网友
1楼 · 编辑于 2024-05-12 19:16:05

使用两步方法：首先获取从“review:”到EOL的所有内容，然后将其标记化。
msg = 'this is the message. review: http://url.com/123 http://url.com/456' review_pattern = re.compile('.*review: (.*)$') urls = review_pattern.findall(msg)[0] url_pattern = re.compile("(http://url.com/(\d+))") url_pattern.findall(urls)

网友
2楼 · 编辑于 2024-05-12 19:16:05

用这个。您需要将“review”放在捕获组之外以获得所需的结果。
pattern = re.compile(r'(?:review: )?(http://url.com/(\d+))\s?', re.IGNORECASE)
这会产生输出
>>> match = pattern.findall('this is the message. review: http://url.com/123 http://url.com/456') >>> match [('http://url.com/123', '123'), ('http://url.com/456', '456')]

网友
3楼 · 编辑于 2024-05-12 19:16:05

你在正则表达式中有多余的。在python中，模式应该只是一个字符串。e、 g.而不是这个：

pattern = re.compile('/review: (http://url.com/(\d+)\s?)+/', re.IGNORECASE)

应该是：

pattern = re.compile('review: (http://url.com/(\d+)\s?)+', re.IGNORECASE)

在python中，通常会使用如下“原始”字符串：

pattern = re.compile(r'review: (http://url.com/(\d+)\s?)+', re.IGNORECASE)

字符串前面额外的r省去了很多反斜杠转义等操作

相关问题更多 >

编程相关推荐

热门问题

热门文章