Python 正则表达式不匹配 http://

6 投票

4 回答

6984 浏览

提问于 2025-04-16 22:26

我遇到了一个问题，想要匹配和替换某些单词，但这些单词不能出现在以 http:// 开头的链接中。

现在的正则表达式是：

 http://.*?\s+

这个正则表达式可以匹配这样的模式：http://www.egg1.com http://www.egg2.com

我需要一个正则表达式，能够匹配那些不在 http:// 里的特定单词。

举个例子：

"This is a sample. http://www.egg1.com and http://egg2.com. This regex will only match 
 this egg1 and egg2 and not the others contained inside http:// "

 Match: egg1 egg2

 Replaced: replaced1 replaced2

最终输出：

 "This is a sample. http://www.egg1.com and http://egg2.com. This regex will only 
  match this replaced1 and replaced2 and not the others contained inside http:// "

问题：需要匹配某些模式（比如例子中的：egg1 和 egg2），但前提是它们不能是 http:// 链接的一部分。如果 egg1 和 egg2 出现在 http:// 里，就不匹配它们。

正则表达式字符串处理文本替换模式匹配负向匹配链接过滤特定单词

4 个回答

你需要在你的模式前面加上一个“负向回顾断言”：

(?<!http://)egg[0-9]

在这个正则表达式中，每当正则引擎找到一个匹配 egg[0-9] 的模式时，它会向后查看，确认前面的模式 不匹配 http://。负向回顾断言是以 (?<! 开始，以 ) 结束的。夹在这两个符号之间的内容是不能出现在后面的模式之前的，并且不会被包含在结果中。

在你的情况下如何使用它：

>>> regex = re.compile('(?<!http://)egg[0-9]')
>>> a = "Example: http://egg1.com egg2 http://egg3.com egg4foo"
>>> regex.findall(a)
['egg2', 'egg4']

回答于 2025-04-16 由 Python大师

分享举报

这段代码不会捕捉到 http://... 这样的链接：

(?:http://.*?\s+)|(egg1)

回答于 2025-04-16 由 Python大师

分享举报

我想到的一个解决办法是，把HTTP网址和你的模式组合成一个新的模式，然后根据这个模式来筛选匹配的结果：

import re

t = "http://www.egg1.com http://egg2.com egg3 egg4"

p = re.compile('(http://\S+)|(egg\d)')
for url, egg in p.findall(t):
  if egg:
    print egg

输出结果是：

egg3
egg4

更新：如果想用这个方法配合re.sub()，只需要提供一个筛选函数：

p = re.compile(r'(http://\S+)|(egg(\d+))')

def repl(match):
    if match.group(2):
        return 'spam{0}'.format(match.group(3))
    return match.group(0)

print p.sub(repl, t)

输出结果是：

http://www.egg1.com http://egg2.com spam3 spam4

回答于 2025-04-16 由 Python大师

分享举报

Python 正则表达式不匹配 http://

4 个回答

撰写回答