如何在Python中解析以下文件的部分?

2024-04-25 14:11:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样一个文件:

Hi:
    fdsfdsfdsfdsfdsfdsfdsfdsfdfdsfdsfdsfsdfdsfsdfdsfsdfdsfds
    fdsfdsfdsfdsfdsfdsfdsfsfdsfdsfsdfsdfsdfsdffdsfdsfds
    Exampples:

    >>fdsfds
    >>ok

    This is it.

Hello:
    fdsfdsfdsfdsfdsfdsfdsfdsfdsfsd
    fdsfdsfdsfdsfds
    fdsfdsfsd

Hi的部分是从fds...This is it.Hello的部分是从fds..fds.. 我只想得到所有标题的部分。我想到了以下方法:

Start from :然后查找\n\n,它将分别为我提供部分。但这不会,因为节本身可以具有相同的格式。我不想用regexConfigparser来做这个。我正在寻找简单的解析。如何解决这个问题?你知道吗


Tags: 文件helloisitokhithisfds
2条回答

您可以搜索不以五个空格开头的行:

tab = "     " # five spaces
with open('input.txt', 'r') as f:
    for line in f:
        if line.startswith(tab):
            print line

使用正则表达式非常简单:

txt='''\
Hi:
    fdsfdsfdsfdsfdsfdsfdsfdsfdfdsfdsfdsfsdfdsfsdfdsfsdfdsfds
    fdsfdsfdsfdsfdsfdsfdsfsfdsfdsfsdfsdfsdfsdffdsfdsfds
    Exampples:

    >>fdsfds
    >>ok

    This is it.

Hello:
    fdsfdsfdsfdsfdsfdsfdsfdsfdsfsd
    fdsfdsfdsfdsfds
    fdsfdsfsd'''

import re

print(re.findall(r'^(\w+:.*?)(?=^\w+:|\Z)', txt, re.S | re.M))  

印刷品:

['Hi:\n    fdsfdsfdsfdsfdsfdsfdsfdsfdfdsfdsfdsfsdfdsfsdfdsfsdfdsfds\n    fdsfdsfdsfdsfdsfdsfdsfsfdsfdsfsdfsdfsdfsdffdsfdsfds\n    Exampples:\n\n    >>fdsfds\n    >>ok\n\n    This is it.\n\n', 'Hello:\n    fdsfdsfdsfdsfdsfdsfdsfdsfdsfsd\n    fdsfdsfdsfdsfds\n    fdsfdsfsd']

相关问题 更多 >