运行Python脚本时的问题(pypdf/hex错误)

3 投票
1 回答
875 浏览
提问于 2025-04-16 22:57

我正在尝试用PyPDF模块创建一个Python脚本。这个脚本的功能是:它会取一个名为“Root”的文件夹,把里面所有的PDF文件合并起来,然后把合并后的PDF放到一个名为“Output”的文件夹里,并把它重命名为“Root.pdf”(这个文件夹里包含了被合并的PDF)。接下来,它会对每个子文件夹做同样的事情,最终输出的文件名会和子文件夹的名字一样。

但是在处理子文件夹的时候,我遇到了问题,出现了一个和某些十六进制值有关的错误代码。(看起来是得到了一个空值,而这个空值不是十六进制的)

这是生成的错误代码:

    Traceback (most recent call last):
  File "C:\Documents and Settings\student3\Desktop\Test\pdfMergerV1.py", line 76, in <module>
    files_recursively(path)
  File "C:\Documents and Settings\student3\Desktop\Test\pdfMergerV1.py", line 74, in files_recursively
    os.path.walk(path, process_file, ())
  File "C:\Python27\lib\ntpath.py", line 263, in walk
    walk(name, func, arg)
  File "C:\Python27\lib\ntpath.py", line 259, in walk
    func(arg, top, names)
  File "C:\Documents and Settings\student3\Desktop\Test\pdfMergerV1.py", line 38, in process_file
    pdf = PdfFileReader(file( filename, "rb"))
  File "C:\Python27\lib\site-packages\pyPdf\pdf.py", line 374, in __init__
    self.read(stream)
  File "C:\Python27\lib\site-packages\pyPdf\pdf.py", line 775, in read
    newTrailer = readObject(stream, self)
  File "C:\Python27\lib\site-packages\pyPdf\generic.py", line 67, in readObject
    return DictionaryObject.readFromStream(stream, pdf)
  File "C:\Python27\lib\site-packages\pyPdf\generic.py", line 531, in readFromStream
    value = readObject(stream, pdf)
  File "C:\Python27\lib\site-packages\pyPdf\generic.py", line 58, in readObject
    return ArrayObject.readFromStream(stream, pdf)
  File "C:\Python27\lib\site-packages\pyPdf\generic.py", line 153, in readFromStream
    arr.append(readObject(stream, pdf))
  File "C:\Python27\lib\site-packages\pyPdf\generic.py", line 69, in readObject
    return readHexStringFromStream(stream)
  File "C:\Python27\lib\site-packages\pyPdf\generic.py", line 276, in readHexStringFromStream
    txt += chr(int(x, base=16))
ValueError: invalid literal for int() with base 16: '\x00\x00'

以下是脚本的源代码:

 #----------------------------------------------------------------------------------------------
# Name:        pdfMerger
# Purpose:     Automatic merging of all PDF files in a directory and its sub-directories and
#              rename them according to the folder itself. Requires the pyPDF Module
#
# Current:     Processes all the PDF files in the current directory
# To-Do:       Process the sub-directories.
#
# Version: 1.0
# Author:      Brian Livori
#
# Created:     03/08/2011
# Copyright:   (c) Brian Livori 2011
# Licence:     Open-Source
#---------------------------------------------------------------------------------------------
#!/usr/bin/env <strong class="highlight">python</strong>

import os
import glob
import sys
import fnmatch

from pyPdf import PdfFileReader, PdfFileWriter

output = PdfFileWriter()

path = str(os.getcwd())

x = 0

def process_file(_, path, filelist):
    for filename in filelist:
        if filename.endswith('.pdf'):

            filename = os.path.join(path, filename)
            print "Merging " + filename

            pdf = PdfFileReader(file( filename, "rb"))

            x = pdf.getNumPages()

            i = 0

            while (i != x):

                output.addPage(pdf.getPage(i))
                print "Merging page: " + str(i+1) + "/" + str(x)

                i += 1

                output_dir = "\Output\\"

                ext = ".pdf"
                dir =  os.path.basename(path)
                outputpath = str(os.getcwd()) + output_dir
                final_output = outputpath

                if os.path.exists(final_output) != True:

                                os.mkdir(final_output)
                                outputStream = file(final_output + dir + ext, "wb")
                                os.path.join(outputStream)
                                output.write(outputStream)
                                outputStream.close()

                else:

                                outputStream = file(final_output + dir + ext, "wb")
                                os.path.join(outputStream)
                                output.write(outputStream)
                                outputStream.close()

def files_recursively(topdir):
        os.path.walk(path, process_file, ())

files_recursively(path)

1 个回答

0

看起来你正在读取的PDF文件可能不是有效的PDF文件,或者它们的格式比较特殊,超出了PyPDF的处理能力。你确定你要读取的PDF文件是好的文件吗?

另外,你的代码里有一些奇怪的地方,但这个问题可能特别重要:

output_dir = "\Output\\"

你那里有一个\O的转义序列,这可能不是你想要的。

撰写回答