Python查找文本块,放入字典数组

1 投票
3 回答
47 浏览
提问于 2025-04-14 16:49

我正在尝试寻找一个文本块,并把其中的一些行放到一个字典数组里。也就是说,每找到一个文本块,就为它创建一个字典。例如,下面这段文本:

some
other
text

address-object object1
    name "name1"
    uuid 4ac9cf52-02b5-eecf-0100-18c24100da5e
    zone zone1
    host ip1
    exit

address-object object2
    name "name2"
    uuid a5c02150-a47e-748d-0100-18c24100da5e
    zone zone2
    host ip2
    exit

some
more text

我想把ip和zone存储在每个块的数组中,这样我最终会得到 [[host:ip1,zone:zone1],[host:ip2,zone:zone2]]

我试着循环遍历这个文本文件,但无法正确地循环处理这些块。我觉得我需要用某种方式来迭代,但不太确定。我最后得到的是一个包含所有项目的单一数组,从第一个地址对象的行开始,一直到某个关键词。我需要为每个地址对象设置一个循环,当遇到空行时就开始下一个。

3 个回答

0

首先,你需要一个非常具体的正则表达式,用来描述那些数据块。这里

一旦你找到了这些特定的数据块,就可以用一个非常简单的正则表达式来提取你感兴趣的数据项。

工作模型:

import re 

inp='''\
some
other
text

address-object object1
    name "name1"
    uuid 4ac9cf52-02b5-eecf-0100-18c24100da5e
    zone zone1
    host ip1
    exit

address-object object2
    name "name2"
    uuid a5c02150-a47e-748d-0100-18c24100da5e
    zone zone2
    host ip2
    exit

some
more text'''
print (
    [dict(re.findall(r'(?m)^\s+(host|zone)\s+(\S+)', block.group(1))) 
        for block in re.finditer(r'(?m)^\s*$\n^(address-object\b[\s\S]+?^\s+exit\b)', inp) ]
)

输出结果:

[{'zone': 'zone1', 'host': 'ip1'}, {'zone': 'zone2', 'host': 'ip2'}]

或者,稍微调整一下,你可以一次性获取所有数据:

pat=r'(?m)^\s*$\n^address-object\b.*\r?\n([\s\S]+?)\s+^\s+exit\b'
for b in re.finditer(pat, inp):
    print( 
        {k:v for k,_,v in 
            (e.strip().partition(' ') 
                for e in  b.group(1).splitlines())} )

输出结果:

{'name': '"name1"', 'uuid': '4ac9cf52-02b5-eecf-0100-18c24100da5e', 'zone': 'zone1', 'host': 'ip1'}
{'name': '"name2"', 'uuid': 'a5c02150-a47e-748d-0100-18c24100da5e', 'zone': 'zone2', 'host': 'ip2'}
0

一个可能的解决办法是使用 re 模块:

import re

text = """\
some
other
text

address-object object1
    name "name1"
    uuid 4ac9cf52-02b5-eecf-0100-18c24100da5e
    zone zone1
    host ip1
    exit

address-object object2
    name "name2"
    uuid a5c02150-a47e-748d-0100-18c24100da5e
    zone zone2
    host ip2
    exit

some
more text
"""


pat = r"\s+(zone|host)\s+(.+)"

out = re.findall(pat, text)
out = [dict(t) for t in zip(out[::2], out[1::2])]
print(out)

输出结果是:

[{"zone": "zone1", "host": "ip1"}, {"zone": "zone2", "host": "ip2"}]
0
importantKeys = {'host', 'zone'}

with open('path/to/file') as infile:
    answer = [{}]
    for line in infile:
        k,_,v = line.strip().partition(' ')
        if k in importantKeys:
            answer[-1][k] = v
        if len(answer[-1]) == len(importantKeys):
            answer.append({})
In [28]: answer
Out[28]: [{'zone': 'zone1', 'host': 'ip1'}, {'zone': 'zone2', 'host': 'ip2'}, {}]

In [29]: answer = [d for d in answer if d]

In [30]: answer
Out[30]: [{'zone': 'zone1', 'host': 'ip1'}, {'zone': 'zone2', 'host': 'ip2'}]

结果是:

撰写回答