提取通过fi重复的标题之间的行

2024-03-29 10:41:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试修改一个有~43k行的txt文件。在文件中给出命令*Nset之后,我需要提取并保存该命令后面的所有行,当它到达文件中的下一个*命令时停止。每个命令后面有不同数量的行和字符。例如,以下是文件的示例部分:

*Nset

1, 2, 3, 4, 5, 6, 7,

12, 13, 14, 15, 16,

17, 52, 75, 86, 92,

90, 91, 92 93, 94, 95....

*NEXT COMMAND

 blah blah blah

*Nset

 numbers

*Nset

 numbers

*Command

 irrelevant text

当我需要的数字不在两个*Nset之间时,我当前拥有的代码可以工作。当一个*Nset跟随另一个*Nset的数字时,它会跳过该命令和继续的行,我不知道为什么。当下一个命令不是*Nset时,它会找到下一个命令并很好地提取数据。你知道吗

import re

# read in the input deck
deck_name = 'master.txt'
deck = open(deck_name,'r')

#initialize variables
nset_data = []
matched_nset_lines = []
nset_count = 0

for line in deck:
     # loop to extract all nset names and node numbers
     important_line = re.search(r'\*Nset,.*',line)
     if important_line :
         line_value = important_line.group() #name for nset
         matched_nset_lines.insert(nset_count,line_value) #name for nset
         temp = []

        # read lines from the found match up until the next *command
         for line_x in deck :
             if not re.match(r'\*',line_x):
                 temp.append(line_x)
             else : 
                 break

         nset_data.append(temp)

     nset_count = nset_count + 1

我使用的是python3.5。谢谢你的帮助。你知道吗


Tags: 文件thenamein命令reforcount
1条回答
网友
1楼 · 发布于 2024-03-29 10:41:28

如果您只想提取*Nsets之间的行,那么应该使用以下方法:

In [5]: with open("master.txt") as f:
   ...:     data = []
   ...:     gather = False
   ...:     for line in f:
   ...:         line = line.strip()
   ...:         if line.startswith("*Nset"):
   ...:             gather = True
   ...:         elif line.startswith("*"):
   ...:             gather = False
   ...:         elif line and gather:
   ...:             data.append(line)
   ...:

In [6]: data
Out[6]:
['1, 2, 3, 4, 5, 6, 7,',
 '12, 13, 14, 15, 16,',
 '17, 52, 75, 86, 92,',
 '90, 91, 92 93, 94, 95....',
 'numbers',
 'numbers']

而且,如果您需要其他信息,可以简单地扩展上述内容:

In [7]: with open("master.txt") as f:
   ...:     nset_lines = []
   ...:     nset_count = 0
   ...:     data = []
   ...:     gather = False
   ...:     for i, line in enumerate(f):
   ...:         line = line.strip()
   ...:         if line.startswith("*Nset"):
   ...:             gather = True
   ...:             nset_lines.append(i)
   ...:             nset_count += 1
   ...:         elif line.startswith("*"):
   ...:             gather = False
   ...:         elif line and gather:
   ...:             data.append(line)
   ...:

In [8]: nset_lines
Out[8]: [0, 14, 18]

In [9]: nset_count
Out[9]: 3

In [10]: data
Out[10]:
['1, 2, 3, 4, 5, 6, 7,',
 '12, 13, 14, 15, 16,',
 '17, 52, 75, 86, 92,',
 '90, 91, 92 93, 94, 95....',
 'numbers',
 'numbers']

相关问题 更多 >