干净的Python正则表达式

17 投票

3 回答

3903 浏览

提问于 2025-04-15 12:04

有没有更简洁的方法来在Python中写长的正则表达式？我在某个地方看到过这种方法，但Python中的正则表达式不支持列表。

patterns = [
    re.compile(r'<!--([^->]|(-+[^->])|(-?>))*-{2,}>'),
    re.compile(r'\n+|\s{2}')
]

正则表达式字符串处理代码简化

3 个回答

在正则表达式中，你可以使用注释，这样会让它们更容易理解。下面是一个来自 http://gnosis.cx/publish/programming/regular_expressions.html 的例子：

/               # identify URLs within a text file
          [^="] # do not match URLs in IMG tags like:
                # <img src="http://mysite.com/mypic.png">
http|ftp|gopher # make sure we find a resource type
          :\/\/ # ...needs to be followed by colon-slash-slash
      [^ \n\r]+ # stuff other than space, newline, tab is in URL
    (?=[\s\.,]) # assert: followed by whitespace/period/comma 
/

回答于 2025-04-15 由 Python大师

分享举报

虽然@Ayman提到的re.VERBOSE是个不错的主意，但如果你只想实现你展示的内容，可以直接这样做：

patterns = re.compile(
        r'<!--([^->]|(-+[^->])|(-?>))*-{2,}>'
        r'\n+|\s{2}'
)

然后，Python会自动把相邻的字符串拼接在一起（这和C语言的做法很像哦），其他的就交给它处理吧；-）。

回答于 2025-04-15 由 Python大师

分享举报

你可以使用详细模式来写出更易读的正则表达式。在这种模式下：

模式中的空格会被忽略，除了在字符类里面或者在一个没有转义的反斜杠后面。
如果一行中出现了'#'，而这个'#'不在字符类里面，也没有被转义，那么从这个'#'开始到这一行的结尾的所有字符都会被忽略。

以下两个语句是等价的：

a = re.compile(r"""\d +  # the integral part
                   \.    # the decimal point
                   \d *  # some fractional digits""", re.X)

b = re.compile(r"\d+\.\d*")

（摘自详细模式的文档）

回答于 2025-04-15 由 Python大师

分享举报

干净的Python正则表达式

3 个回答

撰写回答