Python 多行正则表达式

7 投票

2 回答

13968 浏览

提问于 2025-04-16 10:16

我在编写一个可以处理多行匹配的正则表达式时遇到了问题。有人能告诉我哪里出错了吗？我正在遍历一个基本的 dhcpd.conf 文件，这个文件里有几百条记录，比如：

host node20007                                                                                                                  
{                                                                                                                              
    hardware ethernet 00:22:38:8f:1f:43;                                                                                       
    fixed-address node20007.domain.com;     
}

我已经有一些正则表达式可以用来匹配 MAC 地址和固定地址，但我无法把它们组合在一起，做到正确匹配。

f = open('/etc/dhcp3/dhcpd.conf', 'r')
re_hostinfo = re.compile(r'(hardware ethernet (.*))\;(?:\n|\r|\r\n?)(.*)',re.MULTILINE)

for host in f:
match = re_hostinfo.search(host)
    if match:
        print match.groups()

目前我的匹配结果看起来像这样：
('hardware ethernet 00:22:38:8f:1f:43', '00:22:38:8f:1f:43', '')

但我想要的结果是这样的：
('hardware ethernet 00:22:38:8f:1f:43', '00:22:38:8f:1f:43', 'node20007.domain.com')

正则表达式字符串处理配置文件数据提取模式匹配多行匹配 dhcp MAC 地址

2 个回答

有时候，简单的方法就是不使用正则表达式。这里举个例子：

for line in open("dhcpd.conf"):
    line = line.rstrip()
    sline = line.split()
    if "hardware ethernet" or "fixed-address" in line:
       print sline[-1]

还有另一种方法

data = open("file").read().split("}");
for item in data:
    item = [ i.strip() for i in item.split("\n") if i != '' ];
    for elem in item:
       if "hardware ethernet" in elem:
           print elem.split()[-1]
    if item: print  item[-1]

输出结果

$ more file
host node20007
{
    hardware ethernet 00:22:38:8f:1f:43;
        fixed-address node20007.domain.com;
}

host node20008
{
    hardware ethernet 00:22:38:8f:1f:44;
        some-address node20008.domain.com;
}

$ python test.py
00:22:38:8f:1f:43;
fixed-address node20007.domain.com;
00:22:38:8f:1f:44;
some-address node20008.domain.com;

回答于 2025-04-16 由 Python大师

分享举报

更新我刚刚发现你得到这些结果的真正原因；在你的代码中：

for host in f:
    match = re_hostinfo.search(host)
    if match:
        print match.groups()

host 指的是一行，但你的模式需要跨越两行来工作。

试试这个：

data = f.read()
for x in regex.finditer(data):
    process(x.groups())

这里的 regex 是一个编译好的模式，可以匹配两行内容。

如果你的文件很大，并且你确定感兴趣的部分总是分布在两行上，那么你可以逐行读取文件，检查当前行是否包含模式的第一部分，然后设置一个标志，告诉你下一行是否需要检查第二部分。如果你不确定，那就变得复杂了，可能需要开始看看 pyparsing 模块。

现在回到最初的回答，讨论一下你应该使用的模式：

你不需要使用MULTILINE；只需匹配空白字符。使用这些构建块来构建你的模式：

(1) 固定文本
(2) 一个或多个空白字符
(3) 一个或多个非空白字符

然后把它们放在括号里，以便获取你的分组。

试试这个：

>>> m = re.search(r'(hardware ethernet\s+(\S+));\s+\S+\s+(\S+);', data)
>>> print m.groups()
('hardware ethernet   00:22:38:8f:1f:43', '00:22:38:8f:1f:43', 'node20007.domain.com')
>>>

请考虑使用“详细模式”... 你可以用它来记录模式的哪些部分与数据的哪些部分匹配，这通常能帮助你更好地构建模式。示例：

>>> regex = re.compile(r"""
... (hardware[ ]ethernet \s+
...     (\S+) # MAC
... ) ;
... \s+ # includes newline
... \S+ # variable(??) text e.g. "fixed-address"
... \s+
... (\S+) # e.g. "node20007.domain.com"
... ;
... """, re.VERBOSE)
>>> print regex.search(data).groups()
('hardware ethernet   00:22:38:8f:1f:43', '00:22:38:8f:1f:43', 'node20007.domain.com')
>>>

回答于 2025-04-16 由 Python大师

分享举报

Python 多行正则表达式

2 个回答

撰写回答