擅长:python、mysql、java
<p>您应该可以使用<a href="http://www.unixuser.org/~euske/python/pdfminer/" rel="noreferrer">pdfminer</a>来完成这项工作,但这需要深入研究pdfminer的内部,了解pdf格式(当然是wrt格式,也需要了解pdf的内部结构,如“字典”和“间接对象”)。</p>
<p>这个例子可能会帮助您(我认为它只适用于简单的情况,没有嵌套字段等…)</p>
<pre><code>import sys
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdftypes import resolve1
filename = sys.argv[1]
fp = open(filename, 'rb')
parser = PDFParser(fp)
doc = PDFDocument(parser)
fields = resolve1(doc.catalog['AcroForm'])['Fields']
for i in fields:
field = resolve1(i)
name, value = field.get('T'), field.get('V')
print '{0}: {1}'.format(name, value)
</code></pre>
<p>编辑:忘记提及:如果需要提供密码,请将其传递给<code>doc.initialize()</code></p>