用加密的Python填写PDF表单

1。使用PyPDF2

在这个堆栈溢出问题的第二个答案中建议的方法：Batch fill PDF forms from python or bash

# -*- coding: utf-8 -*- from collections import OrderedDict from PyPDF2 import PdfFileWriter, PdfFileReader def _getFields(obj, tree=None, retval=None, fileobj=None): """ Extracts field data if this PDF contains interactive form fields. The *tree* and *retval* parameters are for recursive use. :param fileobj: A file object (usually a text file) to write a report to on all interactive form fields found. :return: A dictionary where each key is a field name, and each value is a :class:`Field<PyPDF2.generic.Field>` object. By default, the mapping name is used for keys. :rtype: dict, or ``None`` if form data could not be located. """ fieldAttributes = {'/FT': 'Field Type', '/Parent': 'Parent', '/T': 'Field Name', '/TU': 'Alternate Field Name', '/TM': 'Mapping Name', '/Ff': 'Field Flags', '/V': 'Value', '/DV': 'Default Value'} if retval is None: retval = OrderedDict() catalog = obj.trailer["/Root"] # get the AcroForm tree if "/AcroForm" in catalog: tree = catalog["/AcroForm"] else: return None if tree is None: return retval obj._checkKids(tree, retval, fileobj) for attr in fieldAttributes: if attr in tree: # Tree is a field obj._buildField(tree, retval, fileobj, fieldAttributes) break if "/Fields" in tree: fields = tree["/Fields"] for f in fields: field = f.getObject() obj._buildField(field, retval, fileobj, fieldAttributes) return retval def get_form_fields(infile): infile = PdfFileReader(open(infile, 'rb')) fields = _getFields(infile) return OrderedDict((k, v.get('/V', '')) for k, v in fields.items()) if __name__ == '__main__': from pprint import pprint pdf_file_name = '2PagesFormExample.pdf' pprint(get_form_fields(pdf_file_name))

但是，程序在解密PDF时有问题：

File "c:\Users\User1\iCloudDrive\fahrgastrechte\fahrgastrechte.py", line 94, in <module> pprint(get_form_fields(pdf_file_name)) File "c:\Users\User1\iCloudDrive\fahrgastrechte\fahrgastrechte.py", line 62, in get_form_fields fields = _getFields(infile) File "c:\Users\User1\iCloudDrive\fahrgastrechte\fahrgastrechte.py", line 32, in _getFields catalog = obj.trailer["/Root"] File "C:\Program Files\Python36\lib\site-packages\PyPDF2\generic.py", line 516, in __getitem__ return dict.__getitem__(self, key).getObject() File "C:\Program Files\Python36\lib\site-packages\PyPDF2\generic.py", line 178, in getObject return self.pdf.getObject(self).getObject() File "C:\Program Files\Python36\lib\site-packages\PyPDF2\pdf.py", line 1617, in getObject raise utils.PdfReadError("file has not been decrypted") PyPDF2.utils.PdfReadError: file has not been decrypted

我不知道为什么解密是必要的，因为我只想在第一时间读取数据。我能理解什么时候写数据。然而，它也可以写在PDF的领域时，例如使用谷歌浏览器。你知道吗

2。使用pypdftk

一开始我只是想读一下表格的数据：

import pypdftk pdf_file_name = './fahrgastrechteformular.pdf' data = pypdftk.dump_data_fields(pdf_file_name)

当前我的系统（Windows 10）无法识别pdftk.exe文件pyhton模块正在调用它。所以我直接在bash中调用它：

pdftk.exe fahrgastrechteformular.pdf dum_data_fields

我还发现了一个加密错误：

Error: Failed to open PDF file: fahrgastrechteformular.pdf OWNER PASSWORD REQUIRED, but not given (or incorrect) Error: Unable to find file. Error: Failed to open PDF file: dum_data_fields Done. Input errors, so no output created.

所以在开始的时候我只想阅读PDF的表单域。例如，当我用googlechrome填充第一个字段“柏林中央车站”时。我想通过上面提到的python脚本来读取它。下一步是，实际编辑字段内容。希望你能跟上。有不清楚的地方请提问。你知道吗

0条回答

目前没有回答

1。使用PyPDF2

2。使用pypdftk

相关问题更多 >

编程相关推荐

热门问题

热门文章