如何使用pdfminer、pypdf2或任何pdf挖掘python库读取可编辑pdf中的条目？

2024-05-15 12:06:03 发布

男 | 程序猿一只，喜欢编程写python代码。

使用pdfminer

from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from io import BytesIO

def convert_pdf_to_txt(path):
     rsrcmgr = PDFResourceManager()
     retstr = BytesIO()
     codec = 'utf-8'
     laparams = LAParams()
     device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
     fp = open(path, 'rb')
     interpreter = PDFPageInterpreter(rsrcmgr, device)
     password = ""
     maxpages = 0
     caching = True
     pagenos=set()

     for page in PDFPage.get_pages(fp, pagenos, maxpages=maxpages, password=password,caching=caching, check_extractable=True):
    interpreter.process_page(page)

     text = retstr.getvalue()

fp.close()
device.close()
retstr.close()
return text

使用PyPDF2

^{pr2}$

但是即使在尝试了两个api之后，我还是得到了pdf的harcoded值，但没有在空框中的值，而空框是pdf的可编辑部分？在

示例：-船舶名称：salil

我能读懂“船名”，但我得到的不是salil，而是“空字符串”。关键是要能读懂萨利尔。在

下面是我想解析的pdf文件，请帮忙。在

Tags： from import close pdf device page password pdfminer

0条回答

目前没有回答

如何使用pdfminer、pypdf2或任何pdf挖掘python库读取可编辑pdf中的条目？

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何使用pdfminer、pypdf2或任何pdf挖掘python库读取可编辑pdf中的条目？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >