阅读Python中的PDF并转换为PDF中的文本

3条回答

网友

1楼 · 编辑于 2024-05-14 22:06:35

你的表情

("pdftotext %s %s") %( input1, output)

将转换为

pdftotext //Home//Sai Krishna Dubagunta.pdf //Home//Me.txt

这意味着传递给pdftotext的第一个参数是//Home//Sai，第二个参数是Krishna。那显然行不通。

将参数括在引号中：

os.system("pdftotext '%s' '%s'" % (input1, output))

网友

2楼 · 编辑于 2024-05-14 22:06:35

我认为pdftotext命令只接受一个参数。尝试使用：

os.system(("pdftotext %s") % input1)

看看会发生什么。希望这有帮助。

网友

3楼 · 编辑于 2024-05-14 22:06:35

有各种Python包可以使用Python从PDF中提取文本。

pdftotext公司

^{}包：似乎工作得很好，但它没有选项，例如提取边界框

安装

对于Ubuntu：

sudo apt-get install build-essential libpoppler-cpp-dev pkg-config python-dev

最小工作示例

import pdftotext

with open("lorem_ipsum.pdf", "rb") as f:
    pdf = pdftotext.PDF(f)

# Iterate over all the pages
for page in pdf:
    print(page)

# Just read the second page
print(pdf.read(2))

# Or read all the text at once
print(pdf.read_all())

PDF矿工

用pip install pdfminer.six安装。最小的工作示例是here。

pdftotext公司

安装

最小工作示例

PDF矿工

相关问题更多 >

编程相关推荐

热门问题

热门文章

阅读Python中的PDF并转换为PDF中的文本

pdftotext公司

安装

最小工作示例

PDF矿工

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >