Python：从文件多次读取数据行的问题

Question

我正在尝试在Win32上写一个Python2.6的脚本，这个脚本会读取一个文件夹里的所有文本文件，并只打印出包含实际数据的行。下面是一个示例文件 -

Set : 1 
Date: 10212009 
12 34 56 
25 67 90
End Set 
+++++++++
Set: 2 
Date: 10222009 
34 56 89 
25 67 89 
End Set

在这个示例文件中，我只想打印第3、4行和第9、10行（这些是实际的数据值）。程序会对所有的txt文件进行这样的处理。我写了下面的脚本，并在测试一个txt文件时逐步进行修改。我的思路是逐个读取输入文件，寻找一个开始字符串。一旦找到这个字符串，就开始寻找结束字符串。当开始和结束字符串都找到后，就打印从开始字符串到结束字符串之间的行。在打开另一个文件之前，先对文件的其余部分进行相同的操作。

我遇到的问题是，程序能成功读取第一组数据，但在文件中的后续数据组时就出错了。对于第二组数据，它能识别出要读取的行数，但却从错误的行号开始打印。

经过一些调查，我发现了以下几点解释 -

使用seek和tell来重新定位循环的第二次迭代，但这并没有奏效，因为文件是从缓冲区读取的，这样就搞乱了“tell”的值。
有人说以二进制模式打开文件有帮助，但对我来说并没有效果。
尝试以0缓冲模式打开文件，但也没有成功。

我遇到的第二个问题是，当它打印第一组数据时，会在两行数据值之间插入一个空行。我该怎么去掉这个空行呢？

注意：忽略下面代码中所有关于next_run的引用。我是为了重新定位读取的行而尝试的。后续对开始字符串的搜索应该从结束字符串的最后位置开始。

#!C:/Python26 python 

# Import necessary modules 
import os, glob, string, sys, fileinput, linecache 
from goto import goto, label 

# Set working path 
path = 'C:\\System_Data' 


# -------------------- 
# PARSE DATA MODULE 
# -------------------- 

# Define the search strings for data 
start_search = "Set :" 
end_search ="End Set" 
# For Loop to read the input txt files one by one 
for inputfile in glob.glob( os.path.join( path, '*.txt' ) ): 
  inputfile_fileHandle = open ( inputfile, 'rb', 0 ) 
  print( "Current file being read: " +inputfile ) 
  # start_line initializes to first line 
  start_line = 0 
  # After first set of data is extracted, next_run will store the position to read the rest of the file 
  # next_run = 0 
  # start reading the input files, one line by one line 
  for line in inputfile: 
    line = inputfile_fileHandle.readline() 
    start_line += 1 
    # next_run+=1 
    # If a line matched with the start_search string 
    has_match = line.find( start_search ) 
    if has_match >= 0: 
      print ( "Start String found at line number: %d" %( start_line ) ) 
      # Store the location where the search will be restarted 
      # next_run = inputfile_fileHandle.tell() #inputfile_fileHandle.lineno() 
      print ("Current Position: %d" % next_run) 
      end_line = start_line 
      print ( "Start_Line: %d" %start_line ) 
      print ( "End_Line: %d" %end_line ) 
      #print(line) 
      for line in inputfile: 
        line = inputfile_fileHandle.readline() 
        #print (line) 
        end_line += 1 
        has_match = line.find(end_search) 
        if has_match >= 0: 
          print 'End   String found at line number: %d' % (end_line) 
          # total lines to print: 
          k=0 
          # for loop to print all the lines from start string to end string 
          for j in range(0,end_line-start_line-1): 
            print linecache.getline(inputfile, start_line +1+ j ) 
            k+=1 
          print ( "Number of lines Printed: %d " %k ) 
          # Using goto to get out of 2 loops at once 
          goto .re_search_start_string 
    label .re_search_start_string 
    #inputfile_fileHandle.seek(next_run,0) 

  inputfile_fileHandle.close ()

数据处理文件读取文本解析缓冲区字符串搜索二进制模式空行处理行定位

Python：从文件多次读取数据行的问题

8 个回答

撰写回答