在Python 3.1中使用Tesseract OCR时发生UnicodeDecodeError

0 投票
0 回答
13 浏览
提问于 2025-04-12 05:36

我正在尝试制作一个文本识别程序,但遇到了一个错误。

这是我的代码:

from PIL import Image
import pytesseract

print("Enter File/Folder's Full Path")
file = input('> ')
print("Enter output Folder") 
outfol = input('> ')  
print() print('Converting...') 
print()  
os.system('cd /d ' + outfol) 
pdf = pytesseract.image_to_pdf_or_hocr(file, extension='pdf', config='--psm 1') 
with open('test.pdf', 'w+b') as f:     
    f.write(pdf) # pdf type is bytes by default  print()

这是控制台上显示的错误信息:

Traceback (most recent call last):
  File "e:\Desktop\imgrec.py", line 133, in <module>
  File "e:\Desktop\imgrec.py", line 86, in mainmenu
    OCRmenu()
  File "e:\Desktop\imgrec.py", line 118, in OCRmenu
    pdf = pytesseract.image_to_pdf_or_hocr(file, extension='pdf', config='--psm 1')
File "D:\apps\python\lib\site-packages\pytesseract\pytesseract.py", line 446, in    image_to_pdf_or_hocrreturn
    run_and_get_output(*args)
File "D:\apps\python\lib\site-packages\pytesseract\pytesseract.py", line 288, in run_and_get_outputrun_tesseract (**kwargs)
File "D:\apps\python\lib\site-packages\pytesseract\pytesseract.py", line 264, in  run_tesseractraise TesseractError(proc.returncode, get_errors(error_string))
File "D:\apps\python\lib\site-packages\pytesseract\pytesseract.py", line 155, in get_errorsline for line in error_string.decode(DEFAULT_ENCODING).splitlines()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 109: invalid start byte

我尝试在 pytesseract.image_to_pdf_or_hocr 中添加 encoding='utf-8'

结果返回了:TypeError: image_to_pdf_or_hocr() got an unexpected keyword argument 'encoding'

顺便说一下,我还是个新手。

0 个回答

暂无回答

撰写回答