Python2.7：在Windows7中使用pypdfocr有困难 - 问答 - Python中文网

Python2.7：在Windows7中使用pypdfocr有困难

2024-04-25 06:36:17 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我尝试在Windows7中使用pypdfocr和Python2.7。在

这是我在cmd中尝试pypdfocr时得到的错误消息：

C:\Users\chamar.stu>pypdfocr F:\test2.pdf Starting conversion of F:\test2.pdf 'pdfimages' is not recognized as an internal or external command, operable program or batch file. WARNING: Could not execute pdfimages to calculate DPI (try installing xpdf or po ppler?), so defaulting to 300dpi Traceback (most recent call last): File "c:\users\chamar.stu\appdata\local\continuum\anaconda2\lib\runpy.py", line 174, in _run_module_as_main ... .... ....
pypdfocr\pypdfocr_tesseract.py", line 98, in _is_version_uptodate ver = [int(x) for x in ver_str.split('.')] ValueError: invalid literal for int() with base 10: '00alpha'

似乎我缺少Poppler或XPDF，但我确实按照建议通过PyGoObject安装了Poppler。我还将xpdf链接到我的环境路径中，如建议的here。在

有什么建议可以帮我摆脱这个小麻烦吗？在

Tags： or to in py pdf is as not

2条回答

网友

1楼 · 编辑于 2024-04-25 06:36:17

pypdfocr脚本可能正在使用subprocess模块调用pdfimages程序（其中一个poppler实用程序，而不是库）。在

我很难判断这些实用程序是否是在您提到的URI中提供的。在

如果没有，您可以为实用程序（例如here）找到预构建的ms-windows可执行文件。在

确保poppler实用程序的安装位置在PATH中，这样pypdfocr就能找到它。在

网友

2楼 · 编辑于 2024-04-25 06:36:17

尝试将Tesseract从4.0.0-beta.1（我的案例）降级到名称中不包含字母数字的3.x版本。在

tesseract version\

pypdfocr包中内置的版本检查要求版本号是整数，因此'00alpha'（在我的例子中是'0-beta'）上出现错误

相关问题更多 >

编程相关推荐

热门问题

热门文章