我正在尝试OCR标准表格(它们被前后扫描)
我只想OCR扫描的第二张图像(带有文本信息的图像)——有没有办法检测和分割它们,并且只处理正确的图像?对不起,如果我错过了一些重要的东西,就开始吧
import pytesseract as tess
import os
from PIL import Image
import pandas as pd
import tesserocr
path = "/Users/oliviervandhuynslager/PycharmProjects/OCR/DC_SCANS_TEST" ##path to directory (folder) where the images are located
count = 0
fileName = [] #create empty list that will contain the original filenames
fullText = [] #create empty list to store the OCR results per file
for imageName in os.listdir("/Users/oliviervandhuynslager/PycharmProjects/OCR/DC_SCANS_TEST"):
count = count + 1
fileName.append(imageName)
fileName.sort()#generate list from texts.
#%%
# APPEND (OCR) text from images TO LIST fullText
for imageName in os.listdir("/Users/oliviervandhuynslager/PycharmProjects/OCR/DC_SCANS_TEST"):
inputPath = os.path.join(path, imageName)
img = Image.open(inputPath)
text = tess.image_to_string(img, lang="eng")
fullText.append(text)
以下是演示图像的工作示例:
相关问题 更多 >
编程相关推荐