在Python中多次读取同一文件

3 投票
3 回答
4675 浏览
提问于 2025-04-15 17:10

我需要下载一个包含文本文件的压缩包,然后把压缩包里的每个文本文件分发给其他处理程序进行处理,最后把解压后的文本文件写入磁盘。

我有以下代码。它在同一个文件上进行了多次打开和关闭,这看起来不太优雅。我该如何让它变得更优雅和高效呢?

zipped = urllib.urlopen('www.abc.com/xyz.zip')
buf = cStringIO.StringIO(zipped.read())
zipped.close()
unzipped = zipfile.ZipFile(buf, 'r')
for f_info in unzipped.infolist():
   logfile = unzipped.open(f_info)
   handler1(logfile)
   logfile.close()   ## Cannot seek(0). The file like obj does not support seek()
   logfile = unzipped.open(f_info)
   handler2(logfile)
   logfile.close()
   unzipped.extract(f_info)

3 个回答

1

首先打开这个压缩文件,然后把里面所有的文件名一个一个遍历出来。接着,针对每个文件名,提取出对应的文件并进行处理,最后把处理好的文件写入到硬盘上。

大概就是这个意思:

for f_info in unzipped.info_list():
    file = unzipped.open(f_info)
    data = file.read()
    # If you need a file like object, wrap it in a cStringIO
    fobj = cStringIO.StringIO(data)
    handler1(fobj)
    handler2(fobj)
    with open(filename,"w") as fp:
        fp.write(data)

你明白了吧

1

你可以这样说:

handler_dispatch(logfile)

还有

def handler_dispatch(file):
   for line in file:
      handler1(line)
      handler2(line)

甚至可以通过创建一个处理器类,里面有多个处理函数(handlerN),让它更灵活,然后在 handler_dispatch 里调用每一个处理函数。就像这样:

class Handler:
    def __init__(self:)
        self.handlers = []

  def add_handler(handler):
      self.handlers.append(handler)

  def handler_dispatch(self, file):
      for line in file:
          for handler in self.handlers:
              handler.handle(line)
5

你的答案就在你给的示例代码里。只需要用StringIO来缓存日志文件:

zipped = urllib.urlopen('www.abc.com/xyz.zip')
buf = cStringIO.StringIO(zipped.read())
zipped.close()
unzipped = zipfile.ZipFile(buf, 'r')
for f_info in unzipped.infolist():
   logfile = unzipped.open(f_info)
   # Here's where we buffer:
   logbuffer = cStringIO.StringIO(logfile.read())
   logfile.close()

   for handler in [handler1, handler2]:
      handler(logbuffer)
      # StringIO objects support seek():
      logbuffer.seek(0)

   unzipped.extract(f_info)

撰写回答