在使用Tesserocr Python进行表检测时需要帮助吗

2024-06-02 09:00:03 发布

您现在位置：Python中文网/ 问答频道 /正文

7176

网友

男 | 程序猿一只，喜欢编程写python代码。

我正在使用python3.7和tesseract 4.00，并尝试使用tesseract进行表检测。在

{1块中的所有元素都是当前讨论中的未知元素

我提供了tesseract中存在的块类型的参考，供您参考

未知：类型尚未知。保留为第一个元素。在
FLOWING_TEXT：位于列中的文本。在
HEADING_TEXT：跨越多个列的文本。在
拉出文本：位于跨列拉出区域中的文本。在
方程：属于方程区域的分区。在
内联式：分区有内联式。在
表：属于表区域的分区。在
垂直文本：文本行垂直排列。在
CAPTION_TEXT：属于图像的文本。在
流动图像：位于列中的图像。在
HEADING_IMAGE：跨越多个列的图像。在
PULLOUT_IMAGE：位于跨列拉出区域中的图像。在
水平线：水平线。在
垂直线：垂直线。在
噪音：位于任何列的外部。在
COUNT:计数

Tesserocr-API

类表检测器：

__TRAINED_DATA_PATH = #Tessdata path 

def detect_table(self, image, tx_id, do_pre_process=True):
    try:
        pre_processed_image = image
        if do_pre_process:
            pre_processed_image = cvtColor(image, COLOR_BGR2GRAY)
            pre_processed_image = medianBlur(pre_processed_image, 3)
            pre_processed_image = GaussianBlur(pre_processed_image, (3, 3), 0)
        conf_score = 0

        with PyTessBaseAPI(psm=6, oem=1, lang="eng",
                           path=self.__TRAINED_DATA_PATH) as api:
            pil_image = Image.fromarray(pre_processed_image)
            api.SetImage(pil_image)


            api.SetVariable("textord_tabfind_find_tables", "true")
            api.SetVariable("textord_tablefind_recognize_tables", "true")
            api.SetVariable("textord_show_tables", "true")
            api.SetVariable("textord_tablefind_show_stats", "true")
            x=api.AnalyseLayout()
            # level = RIL.BLOCK
            for e in iterate_level(x, RIL.BLOCK):
                print(e.Orientation())
                print(e.BlockType())
    except Exception as e:
        Logger.log.error("Error in image_to_data : %s" % e, exc_info=True)

    return result_dec

Tags： text 图像 image 文本 api true 区域元素

0条回答

目前没有回答

在使用Tesserocr Python进行表检测时需要帮助吗

相关问题更多 >

编程相关推荐

热门问题

热门文章

在使用Tesserocr Python进行表检测时需要帮助吗

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >