如何将这个Perl中的正则表达式习语翻译成Python?
我大约一年前从Perl转到了Python,现在已经不想回去了。只有一个我发现用Perl做起来比用Python简单的写法:
if ($var =~ /foo(.+)/) {
# do something with $1
} elsif ($var =~ /bar(.+)/) {
# do something with $1
} elsif ($var =~ /baz(.+)/) {
# do something with $1
}
而对应的Python代码就没那么优雅了,因为if语句一个套一个:
m = re.search(r'foo(.+)', var)
if m:
# do something with m.group(1)
else:
m = re.search(r'bar(.+)', var)
if m:
# do something with m.group(1)
else:
m = re.search(r'baz(.+)', var)
if m:
# do something with m.group(2)
有没有人能提供一种优雅的方法在Python中实现这个模式?我见过使用匿名函数的调度表,但对于少量的正则表达式来说,我觉得那样有点麻烦……
15 个回答
10
r"""
This is an extension of the re module. It stores the last successful
match object and lets you access it's methods and attributes via
this module.
This module exports the following additional functions:
expand Return the string obtained by doing backslash substitution on a
template string.
group Returns one or more subgroups of the match.
groups Return a tuple containing all the subgroups of the match.
start Return the indices of the start of the substring matched by
group.
end Return the indices of the end of the substring matched by group.
span Returns a 2-tuple of (start(), end()) of the substring matched
by group.
This module defines the following additional public attributes:
pos The value of pos which was passed to the search() or match()
method.
endpos The value of endpos which was passed to the search() or
match() method.
lastindex The integer index of the last matched capturing group.
lastgroup The name of the last matched capturing group.
re The regular expression object which as passed to search() or
match().
string The string passed to match() or search().
"""
import re as re_
from re import *
from functools import wraps
__all__ = re_.__all__ + [ "expand", "group", "groups", "start", "end", "span",
"last_match", "pos", "endpos", "lastindex", "lastgroup", "re", "string" ]
last_match = pos = endpos = lastindex = lastgroup = re = string = None
def _set_match(match=None):
global last_match, pos, endpos, lastindex, lastgroup, re, string
if match is not None:
last_match = match
pos = match.pos
endpos = match.endpos
lastindex = match.lastindex
lastgroup = match.lastgroup
re = match.re
string = match.string
return match
@wraps(re_.match)
def match(pattern, string, flags=0):
return _set_match(re_.match(pattern, string, flags))
@wraps(re_.search)
def search(pattern, string, flags=0):
return _set_match(re_.search(pattern, string, flags))
@wraps(re_.findall)
def findall(pattern, string, flags=0):
matches = re_.findall(pattern, string, flags)
if matches:
_set_match(matches[-1])
return matches
@wraps(re_.finditer)
def finditer(pattern, string, flags=0):
for match in re_.finditer(pattern, string, flags):
yield _set_match(match)
def expand(template):
if last_match is None:
raise TypeError, "No successful match yet."
return last_match.expand(template)
def group(*indices):
if last_match is None:
raise TypeError, "No successful match yet."
return last_match.group(*indices)
def groups(default=None):
if last_match is None:
raise TypeError, "No successful match yet."
return last_match.groups(default)
def groupdict(default=None):
if last_match is None:
raise TypeError, "No successful match yet."
return last_match.groupdict(default)
def start(group=0):
if last_match is None:
raise TypeError, "No successful match yet."
return last_match.start(group)
def end(group=0):
if last_match is None:
raise TypeError, "No successful match yet."
return last_match.end(group)
def span(group=0):
if last_match is None:
raise TypeError, "No successful match yet."
return last_match.span(group)
del wraps # Not needed past module compilation
例如:
if gre.match("foo(.+)", var):
# do something with gre.group(1)
elif gre.match("bar(.+)", var):
# do something with gre.group(1)
elif gre.match("baz(.+)", var):
# do something with gre.group(1)
17
使用命名组和调度表:
r = re.compile(r'(?P<cmd>foo|bar|baz)(?P<data>.+)')
def do_foo(data):
...
def do_bar(data):
...
def do_baz(data):
...
dispatch = {
'foo': do_foo,
'bar': do_bar,
'baz': do_baz,
}
m = r.match(var)
if m:
dispatch[m.group('cmd')](m.group('data'))
通过稍微观察一下,你可以自动生成正则表达式和调度表。
8
从 Python 3.8
开始,引入了一个叫做 赋值表达式 (PEP 572) 的新特性,也就是 :=
这个符号。这个新特性让我们可以把条件值 re.search(pattern, text)
存储到一个变量 match
中,这样我们不仅可以检查这个值是不是 None
,还可以在条件的主体部分重复使用这个值:
if match := re.search(r'foo(.+)', text):
# do something with match.group(1)
elif match := re.search(r'bar(.+)', text):
# do something with match.group(1)
elif match := re.search(r'baz(.+)', text)
# do something with match.group(1)