Python receiptparser包_程序模块 - PyPI

基于OCR的收据和票据分析器

receiptparser的Python项目详细描述

接收解析器

摘要

用Python编写的收据和帐单解析器。可以用作Python模块或CLI工具。在

它最初是基于receipt-parser，但实际上已经被完全重写/取代。在

到目前为止，只支持德国的收据，但其他国家可以使用简单的YAML configuration file添加。在

识别率

为了开发这个工具，我使用了182张不同质量的收据。其中一些弄皱了，大部分已经折叠，等等。这组收据的结果是：

Total:             182
Company found:     171
Postal code found: 158
Date found:        159
Amount found:      114

如果你的收据清晰、无皱褶、对比度好，我希望 97%-99%的成功率，除了总量比较难确定外正确地。这可能接近75%。在

在适用的情况下，我选择自动化和质量而不是性能。例如， receiptparser扫描每个图像两次，一次不锐化，一次锐化，这使识别率提高了约6%，但扫描时间增加了一倍。在

安装

先决条件

Python3
PIP3
镶嵌

通过PIP安装

^{pr2}$

通过Git安装

pip3 install -r requirements.txt
pip3 install .

Python用法

fromreceiptparser.configimportread_configfromreceiptparser.parserimportprocess_receiptconfig=read_config('my_config.yml')receipt=process_receipt(config,"my_receipt.jpg",out_dir=None,verbosity=0)print("Filename:   ",receipt.filename)print("Company:    ",receipt.company)print("Postal code:",receipt.postal)print("Date:       ",receipt.date)print("Amount:     ",receipt.sum)

CLI使用

示例

从目录中读取所有图像（.jpg）并打印识别的数据的简单示例到标准输出：

receiptparser tests/data/germany/img/

可以按如下方式自定义输出：

receiptparser -v0 --format "{date:%Y-%m-%d} - {company} - {postal} - {sum}.jpg" tests/data/germany/img/

在本例中，-v0禁止任何输出，除了您在--format FORMAT中指定的内容参数。FORMAT是指定的Python格式字符串here。格式字符串中可以使用以下值：

公司：公司的公认名称
邮政编码：公司认可的邮政编码
日期：票据或收据的认可日期
金额：汇票或收据的美元（或欧元或其他货币）金额

语法

usage: receiptparser [-h][-c CONFIG][--config-file CONFIG_FILE][-t TESSERACT][-f FORMAT][-v {0,1,2}] input

positional arguments:
  input                 file or directory from which images will be read

optional arguments:
  -h, --help            show this help message and exit
  -c CONFIG, --config CONFIG
                        built-in config to use
  --config-file CONFIG_FILE
                        like -c, but point to a file instead
  -t TESSERACT, --tesseract TESSERACT
                        output directory for OCR recognized text (default is to discard)
  -f FORMAT, --format FORMAT
                        format of the recognized output. default is pretty-printing
  -v {0,1,2}, --verbosity {0,1,2}
                        increase output verbosity

欢迎加入QQ群-->： 979659372

receiptparser 1.1

receiptparser的Python项目详细描述

接收解析器

摘要

识别率

安装

先决条件

通过PIP安装

通过Git安装

Python用法

CLI使用

示例

语法

推荐PyPI第三方库

Cython

asyncba

lizard-connector

django-sympa

pyramid-restful-framework

odoo12-addon-base-export-async

mondemand

tweebot

rsl.rest

edwiges

embeddings

http-tarpit

lazier

genshicolumntemplate

django-purls

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

receiptparser 1.1

receiptparser的Python项目详细描述

接收解析器

摘要

识别率

安装

先决条件

通过PIP安装

通过Git安装

Python用法

CLI使用

示例

语法

推荐PyPI第三方库

Cython

asyncba

lizard-connector

django-sympa

pyramid-restful-framework

odoo12-addon-base-export-async

mondemand

tweebot

rsl.rest

edwiges

embeddings

http-tarpit

lazier

genshicolumntemplate

django-purls

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签