splitlines（）和对打开的文件进行迭代会得到不同的结果

with open('test.txt', 'wb') as f: # simulate a file with weird end-of-lines f.write(b'abc\r\r\ndef') with open('test.txt', 'rb') as f: for l in f: print(l) # b'abc\r\r\n' # b'def'

3条回答

网友
1楼 · 编辑于 2024-05-14 14:44:07

我会像这样迭代：
text = "b'abc\r\r\ndef'" results = text.split('\r\r\n') for r in results: print(r)

网友
2楼 · 编辑于 2024-05-14 14:44:07

你为什么不把它分开呢：
input = b'\nabc\r\r\r\nd\ref\nghi\r\njkl' result = input.split(b'\n') print(result) [b'', b'abc\r\r\r', b'd\ref', b'ghi\r', b'jkl']
如果您真的需要，您将丢失后面的\n，以后可以添加到每一行。在最后一行，需要检查是否真的需要它。像
fixed = [bstr + b'\n' for bstr in result] if input[-1] != b'\n': fixed[-1] = fixed[-1][:-1] print(fixed) [b'\n', b'abc\r\r\r\n', b'd\ref\n', b'ghi\r\n', b'jkl']
另一种带有发电机的变体。通过这种方式，它将对大型文件具有内存感知能力，并且语法与原始for l in bin_split(input)类似：
def bin_split(input_str): start = 0 while start>=0 : found = input_str.find(b'\n', start) + 1 if 0 < found < len(input_str): yield input_str[start : found] start = found else: yield input_str[start:] break

网友
3楼 · 编辑于 2024-05-14 14:44:07

有几种方法可以做到这一点，但没有一种特别快

如果要保留行尾，可以尝试re模块：

lines = re.findall(r'[\r\n]+|[^\r\n]+[\r\n]*', text)
# or equivalently
line_split_regex = re.compile(r'[\r\n]+|[^\r\n]+[\r\n]*')
lines = line_split_regex.findall(text)

如果需要结尾，并且文件非常大，则可能需要迭代：

for r in re.finditer(r'[\r\n]+|[^\r\n]+[\r\n]*', text):
    line = r.group()
    # do stuff with line here

如果您不需要结尾，那么您可以更轻松地完成：

lines = list(filter(None, text.splitlines()))

如果只是迭代结果（或使用Python2），则可以省略list()部分：

for line in filter(None, text.splitlines()):
    pass # do stuff with line

相关问题更多 >

编程相关推荐

热门问题

热门文章