Python查找文本块,放入字典数组
我正在尝试寻找一个文本块,并把其中的一些行放到一个字典数组里。也就是说,每找到一个文本块,就为它创建一个字典。例如,下面这段文本:
some
other
text
address-object object1
name "name1"
uuid 4ac9cf52-02b5-eecf-0100-18c24100da5e
zone zone1
host ip1
exit
address-object object2
name "name2"
uuid a5c02150-a47e-748d-0100-18c24100da5e
zone zone2
host ip2
exit
some
more text
我想把ip和zone存储在每个块的数组中,这样我最终会得到 [[host:ip1,zone:zone1],[host:ip2,zone:zone2]]
。
我试着循环遍历这个文本文件,但无法正确地循环处理这些块。我觉得我需要用某种方式来迭代,但不太确定。我最后得到的是一个包含所有项目的单一数组,从第一个地址对象的行开始,一直到某个关键词。我需要为每个地址对象设置一个循环,当遇到空行时就开始下一个。
3 个回答
0
首先,你需要一个非常具体的正则表达式,用来描述那些数据块。这里
一旦你找到了这些特定的数据块,就可以用一个非常简单的正则表达式来提取你感兴趣的数据项。
工作模型:
import re
inp='''\
some
other
text
address-object object1
name "name1"
uuid 4ac9cf52-02b5-eecf-0100-18c24100da5e
zone zone1
host ip1
exit
address-object object2
name "name2"
uuid a5c02150-a47e-748d-0100-18c24100da5e
zone zone2
host ip2
exit
some
more text'''
print (
[dict(re.findall(r'(?m)^\s+(host|zone)\s+(\S+)', block.group(1)))
for block in re.finditer(r'(?m)^\s*$\n^(address-object\b[\s\S]+?^\s+exit\b)', inp) ]
)
输出结果:
[{'zone': 'zone1', 'host': 'ip1'}, {'zone': 'zone2', 'host': 'ip2'}]
或者,稍微调整一下,你可以一次性获取所有数据:
pat=r'(?m)^\s*$\n^address-object\b.*\r?\n([\s\S]+?)\s+^\s+exit\b'
for b in re.finditer(pat, inp):
print(
{k:v for k,_,v in
(e.strip().partition(' ')
for e in b.group(1).splitlines())} )
输出结果:
{'name': '"name1"', 'uuid': '4ac9cf52-02b5-eecf-0100-18c24100da5e', 'zone': 'zone1', 'host': 'ip1'}
{'name': '"name2"', 'uuid': 'a5c02150-a47e-748d-0100-18c24100da5e', 'zone': 'zone2', 'host': 'ip2'}
0
一个可能的解决办法是使用 re
模块:
import re
text = """\
some
other
text
address-object object1
name "name1"
uuid 4ac9cf52-02b5-eecf-0100-18c24100da5e
zone zone1
host ip1
exit
address-object object2
name "name2"
uuid a5c02150-a47e-748d-0100-18c24100da5e
zone zone2
host ip2
exit
some
more text
"""
pat = r"\s+(zone|host)\s+(.+)"
out = re.findall(pat, text)
out = [dict(t) for t in zip(out[::2], out[1::2])]
print(out)
输出结果是:
[{"zone": "zone1", "host": "ip1"}, {"zone": "zone2", "host": "ip2"}]
0
importantKeys = {'host', 'zone'}
with open('path/to/file') as infile:
answer = [{}]
for line in infile:
k,_,v = line.strip().partition(' ')
if k in importantKeys:
answer[-1][k] = v
if len(answer[-1]) == len(importantKeys):
answer.append({})
In [28]: answer
Out[28]: [{'zone': 'zone1', 'host': 'ip1'}, {'zone': 'zone2', 'host': 'ip2'}, {}]
In [29]: answer = [d for d in answer if d]
In [30]: answer
Out[30]: [{'zone': 'zone1', 'host': 'ip1'}, {'zone': 'zone2', 'host': 'ip2'}]
结果是: