使Python的readline方法能够识别这两种行尾变体?

2024-05-14 04:15:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在编写一个Python文件,需要读入几个不同类型的文件。在使用f = open("file.txt", "r")之后,我正在按照传统的for line in f逐行读取文件。

这似乎不适用于所有文件。我猜有些文件以不同的编码结尾(例如\r\n与just\r)。我可以在中读取整个文件并在上执行字符串拆分,但这成本高昂,我不希望这样做。有没有办法让Python的readline方法识别这两种行尾变体?


Tags: 文件字符串intxt类型编码for结尾
2条回答

您可以尝试使用生成器方法自行读取行并忽略任何EOL字符:

def readlines(f):
    line = []
    while True:
        s = f.read(1)
        if len(s) == 0:
            if len(line) > 0:
                yield line
            return
        if s in ('\r','\n'):
            if len(line) > 0:
                yield line
            line = []
        else:
            line.append(s)

for line in readlines(yourfile):
    # ...

使用通用换行符支持--请参见http://docs.python.org/library/functions.html#open

In addition to the standard fopen() values mode may be 'U' or 'rU'. Python is usually built with universal newline support; supplying 'U' opens the file as a text file, but lines may be terminated by any of the following: the Unix end-of-line convention '\n', the Macintosh convention '\r', or the Windows convention '\r\n'. All of these external representations are seen as '\n' by the Python program. If Python is built without universal newline support a mode with 'U' is the same as normal text mode. Note that file objects so opened also have an attribute called newlines which has a value of None (if no newlines have yet been seen), '\n', '\r', '\r\n', or a tuple containing all the newline types seen.

相关问题 更多 >