使用Python3合并PDF文件

1 投票
2 回答
3090 浏览
提问于 2025-04-17 16:44

我正在写一个小脚本,想把很多单页的PDF文件合并在一起。我希望这个脚本能用Python3来运行,并且依赖的东西越少越好。

在合并PDF的部分,我试过用PyPdf这个库。不过,Python 3的支持似乎有点问题;它无法处理用Inkscape生成的PDF文件(我需要这些文件)。我安装了PyPdf的最新git版本,但下面这个测试脚本却不管用:

import PyPDF2

output_pdf = PyPDF2.PdfFileWriter()

with open("testI.pdf", "rb") as input:
    input_pdf = PyPDF2.PdfFileReader(input)
    output_pdf.addPage(input_pdf.getPage(0))

with open("test.pdf", "wb") as output:
    output_pdf.write(output)

它抛出了以下错误信息:

Traceback (most recent call last):
  File "test.py", line 7, in <module>
    output.addPage(input.getPage(0))
  File "/usr/lib/python3.3/site-packages/pyPdf/pdf.py", line 420, in getPage
    self._flatten()
  File "/usr/lib/python3.3/site-packages/pyPdf/pdf.py", line 574, in _flatten
    self._flatten(page.getObject(), inherit)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 165, in getObject
    return self.pdf.getObject(self).getObject()
  File "/usr/lib/python3.3/site-packages/pyPdf/pdf.py", line 616, in getObject
    retval = readObject(self.stream, self)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 66, in readObject
    return DictionaryObject.readFromStream(stream, pdf)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 526, in readFromStream
    value = readObject(stream, pdf)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 57, in readObject
    return ArrayObject.readFromStream(stream, pdf)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 152, in readFromStream
    obj = readObject(stream, pdf)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 86, in readObject
    return NumberObject.readFromStream(stream)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 231, in readFromStream
    return FloatObject(name.decode("ascii"))
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 207, in __new__
    return decimal.Decimal.__new__(cls, str(value), context)
TypeError: optional argument must be a context

不过,同样的脚本在Python 2.7下运行得非常顺利。

我到底哪里出错了呢?这是库里的一个bug吗?我能在不修改PyPDF库的情况下解决这个问题吗?

2 个回答

2

我想让你知道,有一些现成的工具可以做到你想要的事情:

  • PDFtk
  • PDFjam(这是我最喜欢的,不过需要用到LaTeX)
  • 你也可以直接使用GhostScript
    gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=finished.pdf file1.pdf file2.pdf
3

我找到了答案。在Python3.3中,decimal.Decimal模块表现得有点奇怪。这是相关的StackOverflow问题:实例化Decimal类。我对PyPDF2库做了一些临时解决方案,并提交了一个请求。

撰写回答