手写文本python的Google Cloud Vision表单数据提取

2024-05-15 21:22:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样一个图像enter image description here

我正试图像这样提取表单数据

{
"comments":"nil",
"namefirst":"Jhon",
"last":"Doe",
"mf":"",
"address 1": "PICADALLY LONDON",
"APT":"103",
"City": "London",
"State":"Nil",
"DOB": "",
"AGE": 43,
"Phone Number":"+4464343",
"email":"nil",
"date":"20-03-2012"
}

但是我不能像那样提取它,我可以得到盒子的边界,因为我被困在这里5天了,任何帮助都将不胜感激

我的代码

items = []
lines = {}

for text in response.text_annotations[1:]:
    top_x_axis = text.bounding_poly.vertices[0].x
    top_y_axis = text.bounding_poly.vertices[0].y
    bottom_y_axis = text.bounding_poly.vertices[3].y

    if top_y_axis not in lines:
        lines[top_y_axis] = [(top_y_axis, bottom_y_axis), []]

    for s_top_y_axis, s_item in lines.items():
        if top_y_axis < s_item[0][1]:
            lines[s_top_y_axis][1].append((top_x_axis, text.description))
            break

for _, item in lines.items():
    if item[1]:
        words = sorted(item[1], key=lambda t: t[0])
        items.append((item[0], ' '.join([word for _, word in words]), words))

print(items)

有人能帮我吗

提前谢谢


Tags: textinforiftopitemsitemwords