Python - 如何读取以NUL分隔的文件行?

13 投票
2 回答
6161 浏览
提问于 2025-04-17 12:49

我通常用下面的Python代码来读取文件中的每一行:

f = open('./my.csv', 'r')
for line in f:
    print line

但是如果这个文件是用"\0"来分隔每一行(而不是用"\n")呢?有没有什么Python模块可以处理这种情况?

谢谢大家的建议。

2 个回答

0

我对Mark Byers的建议进行了修改,这样我们就可以在Python中读取用NUL字符分隔的文件行了。这种方法是逐行读取一个可能很大的文件,应该会更节省内存。下面是带注释的Python代码:

import sys

# Variables for "fileReadLine()"
inputFile = sys.stdin   # The input file. Use "stdin" as an example for receiving data from pipe.
lines = []   # Extracted complete lines (delimited with "inputNewline").
partialLine = ''   # Extracted last non-complete partial line.
inputNewline="\0"   # Newline character(s) in input file.
outputNewline="\n"   # Newline character(s) in output lines.
readSize=8192   # Size of read buffer.
# End - Variables for "fileReadLine()"

# This function reads NUL delimited lines sequentially and is memory efficient.
def fileReadLine():
   """Like the normal file readline but you can set what string indicates newline.

   The newline string can be arbitrarily long; it need not be restricted to a
   single character. You can also set the read size and control whether or not
   the newline string is left on the end of the read lines.  Setting
   newline to '\0' is particularly good for use with an input file created with
   something like "os.popen('find -print0')".
   """
   # Declare that we want to use these related global variables.
   global inputFile, partialLine, lines, inputNewline, outputNewline, readSize
   if lines: 
       # If there is already extracted complete lines, pop 1st llne from lines and return that line + outputNewline.
       line = lines.pop(0)
       return line + outputNewline
   # If there is NO already extracted complete lines, try to read more from input file.
   while True:   # Here "lines" must be an empty list.
       charsJustRead = inputFile.read(readSize)   # The read buffer size, "readSize", could be changed as you like.
       if not charsJustRead:   
          # Have reached EOF. 
          if partialLine:
             # If partialLine is not empty here, treat it as a complete line and copy and return it.
             popedPartialLine = partialLine
             partialLine = ""   # partialLine is now copied for return, reset it to an empty string to indicate that there is no more partialLine to return in later "fileReadLine" attempt.
             return popedPartialLine   # This should be the last line of input file.
          else:
             # If reached EOF and partialLine is empty, then all the lines in input file must have been read. Return None to indicate this.
             return None
       partialLine += charsJustRead   # If read buffer is not empty, add it to partialLine.
       lines = partialLine.split(inputNewline)   # Split partialLine to get some complete lines.
       partialLine = lines.pop()   # The last item of lines may not be a complete line, move it to partialLine.
       if not lines:
          # Empty "lines" means that we must NOT have finished read any complete line. So continue.
          continue
       else:
          # We must have finished read at least 1 complete llne. So pop 1st llne from lines and return that line + outputNewline (exit while loop).
          line = lines.pop(0)
          return line + outputNewline


# As an example, read NUL delimited lines from "stdin" and print them out (using "\n" to delimit output lines).
while True:
    line = fileReadLine()
    if line is None: break
    sys.stdout.write(line)   # "write" does not include "\n".
    sys.stdout.flush() 

希望这对你有帮助。

15

如果你的文件小到可以一次性全部读入内存的话,可以使用分割功能:

for line in f.read().split('\0'):
    print line

如果文件比较大,你可以试试在讨论中提到的这个方法,关于这个功能请求

def fileLineIter(inputFile,
                 inputNewline="\n",
                 outputNewline=None,
                 readSize=8192):
   """Like the normal file iter but you can set what string indicates newline.
   
   The newline string can be arbitrarily long; it need not be restricted to a
   single character. You can also set the read size and control whether or not
   the newline string is left on the end of the iterated lines.  Setting
   newline to '\0' is particularly good for use with an input file created with
   something like "os.popen('find -print0')".
   """
   if outputNewline is None: outputNewline = inputNewline
   partialLine = ''
   while True:
       charsJustRead = inputFile.read(readSize)
       if not charsJustRead: break
       partialLine += charsJustRead
       lines = partialLine.split(inputNewline)
       partialLine = lines.pop()
       for line in lines: yield line + outputNewline
   if partialLine: yield partialLine

我还注意到你的文件有一个“csv”的扩展名。Python里有一个内置的CSV模块(用import csv来引入)。这个模块里有一个叫做Dialect.lineterminator的属性,但目前在读取时并没有实现:

Dialect.lineterminator

这是写入时用来结束行的字符串。默认是'\r\n'。

注意:读取时是硬编码的,只能识别'\r'或'\n'作为行结束符,忽略lineterminator。这个行为未来可能会有所改变。

撰写回答