无法在Python中使用Tesseract OCR提取图像中的数字

0 投票

1 回答

29 浏览

提问于 2025-04-14 16:04

我目前在做一个项目，需要用Python中的Tesseract OCR从图片中提取数字。不过，我在得到准确结果方面遇到了一些困难。

这是我的代码：

from PIL import Image, ImageEnhance, ImageOps
import pytesseract

# Load the screenshot
screenshot = Image.open("screenshot2.png")

# Crop the region containing the numbers
bbox = (970, 640, 1045, 675)
cropped_image = screenshot.crop(bbox)

# Enhance contrast
enhancer = ImageEnhance.Contrast(cropped_image)
enhanced_image = enhancer.enhance(2.0)

# Convert to grayscale
gray_image = enhanced_image.convert("L")

# Apply thresholding
thresholded_image = gray_image.point(lambda p: p > 150 and 255)

# Invert colors
inverted_image = ImageOps.invert(thresholded_image)

# Convert to binary
binary_image = inverted_image.convert("1")

# Save the processed image
binary_image.save("processed_image.png")

# Perform OCR
text = pytesseract.image_to_string(binary_image, config="--psm 13")

# Extract numbers from the OCR result
numbers = [int(num) for num in text.split() if num.isdigit()]

print(numbers)

但是输出的结果就是：[]

我想要的只是提取出这两个数字。不过如果能先提取出所有的文本，那我可以再处理这些文本，只拿到这两个数字。

到目前为止，我尝试过的步骤有：

I've captured screenshots containing numeric values.
I've cropped the screenshots to focus only on the region containing the numbers.
I've enhanced the contrast and converted the images to grayscale to improve OCR accuracy.
I've applied thresholding and inverted the colors to prepare the images for OCR.
I've tried converting the images to binary format for better recognition.

尽管我尝试了这些预处理步骤，并调整了OCR的设置（比如使用--psm 13），但我还是无法准确提取出图片中的数字。OCR的输出要么是错误的数字，要么根本没有检测到任何数字。

我非常感谢任何关于如何提高我的OCR提取过程准确性的建议或见解。谢谢！

图像处理计算机视觉机器学习文本提取 ocr tesseract 数字识别预处理

1 个回答

我做了一些研究，发现了这篇文章：https://nanonets.com/blog/ocr-with-tesseract/

!! 这段代码是根据Filip Zelic和Anuj Sable创建的文章制作的 !!

import cv2
import pytesseract
import numpy as np
from PIL import Image

# Read the image
image = cv2.imread('screenshot2.png')

# Coordinates from PIL cropping (left, top, right, bottom)
left = 970
top = 640
right = 1045
bottom = 675

# Convert to OpenCV coordinates (x, y, width, height)
x = left
y = top
width = right - left
height = bottom - top

# Crop the image
cropped_image = image[y:y+height, x:x+width]

# Convert the cropped image numpy array to a PIL image object
cropped_pil_image = Image.fromarray(cv2.cvtColor(cropped_image, cv2.COLOR_BGR2RGB))

# Convert the PIL image object to a numpy array
cropped_array = np.array(cropped_pil_image)

# Convert the image to grayscale
gray = cv2.cvtColor(cropped_array, cv2.COLOR_BGR2GRAY)

# Perform OCR on the grayscale image
ocr_text = pytesseract.image_to_string(gray)

print(ocr_text)

通过这种方法，我们可以更准确地从提供的图片中提取出数字[-2, -1]。

感谢你们的耐心和合作，帮助我们找到有效的解决方案。如果你还有其他问题或者需要更多帮助，随时可以问我！

回答于 2025-04-14 由 Python大师

分享举报

无法在Python中使用Tesseract OCR提取图像中的数字

1 个回答

撰写回答