在Python 3.1中使用Tesseract OCR时发生UnicodeDecodeError
我正在尝试制作一个文本识别程序,但遇到了一个错误。
这是我的代码:
from PIL import Image
import pytesseract
print("Enter File/Folder's Full Path")
file = input('> ')
print("Enter output Folder")
outfol = input('> ')
print() print('Converting...')
print()
os.system('cd /d ' + outfol)
pdf = pytesseract.image_to_pdf_or_hocr(file, extension='pdf', config='--psm 1')
with open('test.pdf', 'w+b') as f:
f.write(pdf) # pdf type is bytes by default print()
这是控制台上显示的错误信息:
Traceback (most recent call last):
File "e:\Desktop\imgrec.py", line 133, in <module>
File "e:\Desktop\imgrec.py", line 86, in mainmenu
OCRmenu()
File "e:\Desktop\imgrec.py", line 118, in OCRmenu
pdf = pytesseract.image_to_pdf_or_hocr(file, extension='pdf', config='--psm 1')
File "D:\apps\python\lib\site-packages\pytesseract\pytesseract.py", line 446, in image_to_pdf_or_hocrreturn
run_and_get_output(*args)
File "D:\apps\python\lib\site-packages\pytesseract\pytesseract.py", line 288, in run_and_get_outputrun_tesseract (**kwargs)
File "D:\apps\python\lib\site-packages\pytesseract\pytesseract.py", line 264, in run_tesseractraise TesseractError(proc.returncode, get_errors(error_string))
File "D:\apps\python\lib\site-packages\pytesseract\pytesseract.py", line 155, in get_errorsline for line in error_string.decode(DEFAULT_ENCODING).splitlines()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 109: invalid start byte
我尝试在 pytesseract.image_to_pdf_or_hocr
中添加 encoding='utf-8'
,
结果返回了:TypeError: image_to_pdf_or_hocr() got an unexpected keyword argument 'encoding'
顺便说一下,我还是个新手。
0 个回答
暂无回答