用python将文件解析成字典

2024-04-28 10:27:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个文件,你可以在下面看到它的一小部分:

Clutch001
Albino X Pastel
Bumble Bee X Albino Lesser
Clutch002
Bee X Fire Bee
Albino Cinnamon X Albino
Mojave X Bumble Bee
Clutch003
Black Pastel X Banana Ghost Lesser
....

ClucthXXX和next cluthxxx之间的字符串数可能不同,但不等于零。 我想知道是否可以从一个文件中取一个特定的字符串作为一个键(在我的例子中它是clugxxx)和文本,直到该特定字符串第二次出现时作为字典的值? 我想收到这样的字典:

^{pr2}$

我最感兴趣的部分是,我们将字符串模式保存为键,并将后面的文本保存为值。如有任何关于有用方法的建议或指示,我们将不胜感激。在


Tags: 文件字符串文本字典firebeelessercinnamon
3条回答

收集列表中的行,同时将该列表存储在字典中:

d = {}
values = None
with open(filename) as inputfile:
    for line in inputfile:
        line = line.strip()
        if line.startswith('Clutch'):
            values = d[line] = []
        else:
            values.append(line)

这将为您提供:

^{pr2}$

不过,在加载文件后,很容易将所有这些列表转换为单个字符串:

d = {key: ', '.join(value) for key, value in d.items()}

您也可以在读取文件时进行连接;我将使用生成器函数分组处理文件:

def per_clutch(inputfile):
    clutch = None
    lines = []
    for line in inputfile:
        line = line.strip()
        if line.startswith('Clutch'):
            if lines:
                yield clutch, lines
            clutch, lines = line, []
        else:
            lines.append(line)
    if clutch and lines:
        yield clutch, lines

然后把所有的组都放到字典里:

with open(filename) as inputfile:
    d = {clutch: ', '.join(lines) for clutch, lines in per_clutch(inputfile)}

后者演示:

>>> def per_clutch(inputfile):
...     clutch = None
...     lines = []
...     for line in inputfile:
...         line = line.strip()
...         if line.startswith('Clutch'):
...             if lines:
...                 yield clutch, lines
...             clutch, lines = line, []
...         else:
...             lines.append(line)
...     if clutch and lines:
...         yield clutch, lines
... 
>>> sample = '''\
... Clutch001
... Albino X Pastel
... Bumble Bee X Albino Lesser
... Clutch002
... Bee X Fire Bee
... Albino Cinnamon X Albino
... Mojave X Bumble Bee
... Clutch003
... Black Pastel X Banana Ghost Lesser
... '''.splitlines(True)
>>> {clutch: ', '.join(lines) for clutch, lines in per_clutch(sample)}
{'Clutch001': 'Albino X Pastel, Bumble Bee X Albino Lesser', 'Clutch002': 'Bee X Fire Bee, Albino Cinnamon X Albino, Mojave X Bumble Bee', 'Clutch003': 'Black Pastel X Banana Ghost Lesser'}
>>> from pprint import pprint
>>> pprint(_)
{'Clutch001': 'Albino X Pastel, Bumble Bee X Albino Lesser',
 'Clutch002': 'Bee X Fire Bee, Albino Cinnamon X Albino, Mojave X Bumble Bee',
 'Clutch003': 'Black Pastel X Banana Ghost Lesser'}
from itertools import groupby
from functools import partial

key = partial(re.match, r'Clutch\d\d\d')

with open('foo.txt') as f:
    groups = (', '.join(map(str.strip, g)) for k, g in groupby(f, key=key))
    pprint(dict(zip(*[iter(groups)]*2)))

{'Clutch001': 'Albino X Pastel, Bumble Bee X Albino Lesser',
 'Clutch002': 'Bee X Fire Bee, Albino Cinnamon X Albino, Mojave X Bumble Bee',
 'Clutch003': 'Black Pastel X Banana Ghost Lesser'}

如注释中所述,如果可以依赖“离合器”(或任何关键字)不出现在非关键字行中,则可以使用以下方法:

keyword = "Clutch"
with open(filename) as inputfile:
    t = inputfile.read()
    d = {keyword + s[:3]: s[3:].strip().replace('\n', ', ') for s in t.split(keyword)}

这会一次将整个文件读入内存,因此如果文件可能变得非常大,则应避免使用。在

相关问题 更多 >