使用正则表达式分析大文本文件

2条回答

网友

1楼 · 编辑于 2024-04-25 14:25:44

这将以块（共chunksize字节）读取文件，从而避免与一次读取过多文件相关的内存问题：

import re
def open_delimited(filename, delimiter, *args, **kwargs):
    """
    http://stackoverflow.com/a/17508761/190597
    """
    with open(filename, *args, **kwargs) as infile:
        chunksize = 10000
        remainder = ''
        for chunk in iter(lambda: infile.read(chunksize), ''):
            pieces = re.split(delimiter, remainder + chunk)
            for piece in pieces[:-1]:
                yield piece
            remainder = pieces[-1]
        if remainder:
            yield remainder

filename = 'post.txt'
for chunk in open_delimited(filename, '##', 'r'):
    print(chunk)
    print('-'*80)

网友

2楼 · 编辑于 2024-04-25 14:25:44

您可以使用islice。在

from itertools import islice

file = open('file.txt', 'r')
while True:
  slice = islice(file, buffer)
  to_process = []
  for line in slice:
    to_process.append(line)
  if not to_process:
    break
  #process to_process list
file.close()

buffer是一次要读取的行数（必须定义int）。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用正则表达式分析大文本文件

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >