python tesseract OCR:仅获取数字

2024-06-07 04:25:21 发布

男 | 程序猿一只，喜欢编程写python代码。

我正在使用tesseract OCRwith python tesseract。在tesseract FAQ中，关于数字，我们有：

Use
TessBaseAPI::SetVariable("tessedit_char_whitelist", "0123456789");
BEFORE calling an Init function or put this in a text file called tessdata/configs/digits:
tessedit_char_whitelist 0123456789
and then your command line becomes:
tesseract image.tif outputbase nobatch digits
Warning: Until the old and new config variables get merged, you must have the nobatch parameter too.

在python tesseract中，SetVariable方法存在。我试过了，但是OCR的结果是一样的：

api = tesseract.TessBaseAPI()
api.SetVariable("tessedit_char_whitelist", "0123456789")
api.Init('.','eng',tesseract.OEM_DEFAULT)
api.SetPageSegMode(tesseract.PSM_AUTO)

有人已经让它工作了吗，或者我应该认为它是python tesseract中的一个bug吗？

Tags： and the api init 数字 faq whitelist digits

1条回答

网友

1楼 · 发布于 2024-06-07 04:25:21

好的，开始工作了。根据tesseract ocr的这个(unofficial ?) documentation，SetVariable（）必须在Init（）之后调用，即使在官方的FAQ中有相反的说法。在Init（）之后调用它可以正常工作。

python tesseract OCR:仅获取数字

相关问题更多 >

编程相关推荐

热门问题

热门文章

python tesseract OCR:仅获取数字

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >