使用PyPDF2模块合并PDF文件遇到问题

0 投票
1 回答
535 浏览
提问于 2025-04-28 19:56

我从一个表格里创建了一个字典,这个字典为每个键提供了一系列的PDF文件路径。如果一个键对应多个值,我想把这些PDF文件合并在一起,并用这个键作为输出文件的名称。但是,当我尝试写出合并后的文件时,出现了属性错误。

'unicode' object has no attribute 'write'. 

我的代码是参考了这篇帖子。有人能帮我看看可能出错的地方吗?

import arcpy, os, PyPDF2, shutil
arcpy.env.overwriteOutput = True

gb_xls = r'P:\Records\GIS\__Databases__\MapIndex\_MapSets_Grantors_Verification_Q.xlsx'
gb_gdb_tbl = r'C:\temp\temp.gdb\_MapSets_Grantors_Verification_Q'
gb_tbl_sort = r'C:\temp\temp.gdb\_MapSets_Grantors_Verification_Q'
gb_fields = ['Actual_SheetLabel','GBSheetLabel','Image_Path_Filename']
gb_dict = {}
v_list = []

lastkey = -1
lastvalue = ""

rows = sorted(arcpy.da.SearchCursor(gb_gdb_tbl,gb_fields))
for row in rows:
    k = row[0]
    v = row[2]
    if k not in gb_dict:
        gb_dict[k] = v
    if k == lastkey:
            v = str(lastvalue) + ', ' + str(v)
            gb_dict[k] = v
    lastkey = k
    lastvalue = v

merged_file = PyPDF2.PdfFileMerger()

for k,v in gb_dict.items():
    new_file = os.path.join(r'D:\GrantorBoxes_Merged_Pdfs',k+'.pdf')
    if len(str(v).split(',')) > 1:
        for i in [v]:
            val =  i.split(',')[0]
            merged_file.append(PyPDF2.PdfFileReader(val, 'rb'))
        merged_file.write(new_file)
    else:
        shutil.copyfile(v,new_file)

更新:

我有一些不同的代码,使用PyPDF2来合并PDF文件,这段代码可以顺利合并文件而没有错误。现在我的问题是,它合并的文件数量比我预期的要多。我想遍历我的字典,寻找每个键对应多个值(PDF文件)的项,并把这些值合并成一个文件,文件名用这个键来命名。我的循环或缩进可能有问题,但我看不出来。以下是更新后的代码:

import arcpy, os, PyPDF2, shutil
arcpy.env.overwriteOutput = True

gb_xls = r'P:\Records\GIS\__Databases__\MapIndex\_MapSets_Grantors_Verification_Q.xlsx'
gb_gdb_tbl = r'C:\temp\temp.gdb\_MapSets_Grantors_Verification_Q'
gb_tbl_sort = r'C:\temp\temp.gdb\_MapSets_Grantors_Verification_Q'
gb_fields = ['Actual_SheetLabel','GBSheetLabel','Image_Path_Filename']
gb_dict = {}

lastkey = -1
lastvalue = ""

rows = sorted(arcpy.da.SearchCursor(gb_gdb_tbl,gb_fields))
for row in rows:
    k = row[0]
    v = row[2]
    if k not in gb_dict:
        gb_dict[k] = v
    if k == lastkey:
        v = str(lastvalue) + ',' + str(v)
        gb_dict[k] = v
    lastkey = k
    lastvalue = v

merger = PyPDF2.PdfFileMerger()

for k,v in gb_dict.items():
    v_list = v.split(',')
    if len(v_list) > 1:
        for i in v_list:
            print k,',',i
            input = open(i,'rb')
            merger.append(input)
        output = open(os.path.join(r'D:\GrantorBoxes_Merged_Pdfs',k+'.pdf'), "wb")
        merger.write(output)
        print output
    else:
        new_file = os.path.join(r'D:\GrantorBoxes_Merged_Pdfs',k+'.pdf')
        shutil.copyfile(str(v),new_file)
暂无标签

1 个回答

0

我之前用PyPDF2来处理PDF文件,但后来发现可以直接使用arcpy的映射模块,这里面有一些处理PDF文档的功能。下面是我写的可用代码:

import arcpy, os, shutil
arcpy.env.overwriteOutput = True

gb_xls = r'P:\Records\GIS\__Databases__\MapIndex\_MapSets_Grantors_Verification_Q.xlsx'
gb_gdb_tbl = r'C:\temp\temp.gdb\_MapSets_Grantors_Verification_Q'
gb_tbl_sort = r'C:\temp\temp.gdb\_MapSets_Grantors_Verification_Q'
gb_fields = ['Actual_SheetLabel','GBSheetLabel','Image_Path_Filename']
gb_dict = {}
lastkey = -1
lastvalue = ""

rows = sorted(arcpy.da.SearchCursor(gb_gdb_tbl,gb_fields))
for row in rows:
    k = row[0]
    v = row[2]
    if k not in gb_dict:
        gb_dict[k] = v
    if k == lastkey:
        v = str(lastvalue) + ',' + str(v)
        gb_dict[k] = v
    lastkey = k
    lastvalue = v

for k in gb_dict.keys():
    val = gb_dict.get(k)
    val_list = gb_dict.get(k).split(',')
    pdf_path = os.path.join(r'D:\GrantorBoxes_Merged_Pdfs',k + '.pdf')
    out_pdf_file = arcpy.mapping.PDFDocumentCreate(os.path.join(r'D:\GrantorBoxes_Merged_Pdfs',k + '.pdf'))
    if len(val_list) > 1:
        for v in val_list:
            print k,v
            out_pdf_file.appendPages(v)
        print out_pdf_file
        out_pdf_file.saveAndClose()
    else:
        shutil.copyfile(str(val),pdf_path)

撰写回答