关闭Python pypdf时出错 - 报错ValueError: 在已关闭文件上进行I/O操作

4 投票
1 回答
5330 浏览
提问于 2025-04-16 21:59

我搞不懂这个问题。

这个函数是用来把从网站抓取的内容合并成一个PDF文件的,使用的是pypdf这个库。

这是方法的代码:

def mergePdf(self,mainname,inputlist=0):
    """merging the pdf pages
    getting an inputlist to merge or defaults to the class instance self.pdftomerge list"""
    from pyPdf import PdfFileWriter, PdfFileReader
    self._mergelist = inputlist or self.pdftomerge
    self.pdfoutput = PdfFileWriter()

    for name in self._mergelist:
        print "merging %s into main pdf file: %s" % (name,mainname)
        self._filestream = file(name,"rb")
        self.pdfinput = PdfFileReader(self._filestream)
        for p in self.pdfinput.pages:
            self.pdfoutput.addPage(p)
        self._filestream.close()

    self._pdfstream = file(mainname,"wb")
    self._pdfstream.open()
    self.pdfoutput.write(self._pdfstream)
    self._pdfstream.close()

我一直收到这个错误:

  File "c:\tmp\easy_install-iik9vj\pyPdf-1.13-py2.7-win32.egg.tmp\pyPdf\pdf.py", line 264, in write
    self._sweepIndirectReferences(externalReferenceMap, self._root)
  File "c:\tmp\easy_install-iik9vj\pyPdf-1.13-py2.7-win32.egg.tmp\pyPdf\pdf.py", line 339, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "c:\tmp\easy_install-iik9vj\pyPdf-1.13-py2.7-win32.egg.tmp\pyPdf\pdf.py", line 315, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "c:\tmp\easy_install-iik9vj\pyPdf-1.13-py2.7-win32.egg.tmp\pyPdf\pdf.py", line 339, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "c:\tmp\easy_install-iik9vj\pyPdf-1.13-py2.7-win32.egg.tmp\pyPdf\pdf.py", line 315, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "c:\tmp\easy_install-iik9vj\pyPdf-1.13-py2.7-win32.egg.tmp\pyPdf\pdf.py", line 324, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "c:\tmp\easy_install-iik9vj\pyPdf-1.13-py2.7-win32.egg.tmp\pyPdf\pdf.py", line 339, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "c:\tmp\easy_install-iik9vj\pyPdf-1.13-py2.7-win32.egg.tmp\pyPdf\pdf.py", line 315, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "c:\tmp\easy_install-iik9vj\pyPdf-1.13-py2.7-win32.egg.tmp\pyPdf\pdf.py", line 345, in _sweepIndirectReferences
    newobj = data.pdf.getObject(data)
  File "c:\tmp\easy_install-iik9vj\pyPdf-1.13-py2.7-win32.egg.tmp\pyPdf\pdf.py", line 645, in getObject
    self.stream.seek(start, 0)
ValueError: I/O operation on closed file

但是当我检查self._pdfstream的状态时,我得到了:

<open file 'c:\python27\learn\dive.pdf', mode 'wb' at 0x013B2020>

我到底哪里出错了?

任何帮助我都会很感激。

1 个回答

7

好的,我找到了你的问题。你调用 file() 是对的。根本不需要尝试调用 open()

你的问题在于,当你调用 self.pdfoutput.write(self._pdfstream) 时,输入 文件仍然需要保持打开状态,所以你需要去掉那行 self._filestream.close()

补充说明:这个脚本会引发问题。第一次写入会成功,但第二次就会失败。

from pyPdf import PdfFileReader as PfR, PdfFileWriter as PfW

input_filename = 'in.PDF' # replace with a real file
output_filename = 'out.PDF' # something that doesn't exist

infile = file(input_filename, 'rb')
reader = PfR(infile)
writer = PfW()

writer.addPage(reader.getPage(0))
outfile = file(output_filename, 'wb')
writer.write(outfile)
print "First Write Successful!"
infile.close()
outfile.close()

infile = file(input_filename, 'rb')
reader = PfR(infile)
writer = PfW()

writer.addPage(reader.getPage(0))
outfile = file(output_filename, 'wb')
infile.close() # BAD!

writer.write(outfile)
print "You'll get an IOError Before this line"
outfile.close()

撰写回答