如何从文本中获取json格式?

2024-05-14 08:16:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个JSON文件,其中包含几个图像和注释。每个图像都有一个id,每个注释引用图像的标题和image_id。有成千上万的图像和多个注释引用同一图像。下面是一个仅用于一个图像及其注释(link to full data)的示例:

{
  "images": [
    {
      "license": 5,
      "url": "http://farm4.staticflickr.com/3153/2970773875_164f0c0b83_z.jpg",
      "file_name": "COCO_train2014_000000057870.jpg",
      "id": 57870,
      "width": 640,
      "date_captured": "2013-11-14 16:28:13",
      "height": 480
    }
  ],
  "annotations": [
    {
      "image_id": 57870,
      "id": 787980,
      "caption": "A restaurant has modern wooden tables and chairs."
    },
    {
      "image_id": 57870,
      "id": 789366,
      "caption": "A long restaurant table with rattan rounded back chairs."
    },
    {
      "image_id": 57870,
      "id": 789888,
      "caption": "a long table with a plant on top of it surrounded with wooden chairs "
    },
    {
      "image_id": 57870,
      "id": 791316,
      "caption": "A long table with a flower arrangement in the middle for meetings"
    },
    {
      "image_id": 57870,
      "id": 794853,
      "caption": "A table is adorned with wooden chairs with blue accents."
    }
  ]
}

我需要将此文件中文本的格式重新构造为:

COCO_train2014_000000057870.jpg#0 A restaurant has modern wooden tables and chairs.
COCO_train2014_000000057870.jpg#1 A long restaurant table with rattan rounded back chairs.
COCO_train2014_000000057870.jpg#2 a long table with a plant on top of it surrounded with wooden chairs
COCO_train2014_000000057870.jpg#3 A long table with a flower arrangement in the middle for meetings
COCO_train2014_000000057870.jpg#4 A table is adorned with wooden chairs with blue accents.

我知道这个想法,但不能很好地用Python编程。我需要首先检查image_id是否相等,如果相等,我需要获取他们的ID,并将其从0到4编号,然后获取他们的标题


Tags: 文件图像imageid标题withtablerestaurant
1条回答
网友
1楼 · 发布于 2024-05-14 08:16:29

读入数据后,重新组织到按ID索引的字典中,可以在迭代注释时轻松访问正确的图像。下面会执行此操作,但也会将每个标题添加到添加到每个图像的标题列表中:

import json

with open('captions_train2014.json') as f:
    data = json.load(f)

# Collect all images into a dictionary indexed by ID
images = {p['id']:p for p in data['images']}

# To each image, add a list of captions
for image in images.values():
    image['captions'] = []

# For each annotation, add its caption to its
# corresponding image's caption list.
for annotation in data['annotations']:
    image_id = annotation['image_id']
    annotation_id = annotation['id']
    images[image_id]['captions'].append(annotation['caption'])

# Iterate over images and print captions in the format requested.
for image in images.values():
    for i,caption in enumerate(image['captions']):
        print(f"{image['file_name']}#{i} {caption}")

输出:

COCO_train2014_000000057870.jpg#0 A restaurant has modern wooden tables and chairs.
COCO_train2014_000000057870.jpg#1 A long restaurant table with rattan rounded back chairs.
COCO_train2014_000000057870.jpg#2 a long table with a plant on top of it surrounded with wooden chairs
COCO_train2014_000000057870.jpg#3 A long table with a flower arrangement in the middle for meetings
COCO_train2014_000000057870.jpg#4 A table is adorned with wooden chairs with blue accents.
COCO_train2014_000000384029.jpg#0 A man preparing desserts in a kitchen covered in frosting.
COCO_train2014_000000384029.jpg#1 A chef is preparing and decorating many small pastries.
COCO_train2014_000000384029.jpg#2 A baker prepares various types of baked goods.
COCO_train2014_000000384029.jpg#3 a close up of a person grabbing a pastry in a container
COCO_train2014_000000384029.jpg#4 Close up of a hand touching various pastries.
COCO_train2014_000000222016.jpg#0 a big red telephone booth that a man is standing in
COCO_train2014_000000222016.jpg#1 a person standing inside of a phone booth
COCO_train2014_000000222016.jpg#2 this is an image of a man in a phone booth.
COCO_train2014_000000222016.jpg#3 A man is standing in a red phone booth.
COCO_train2014_000000222016.jpg#4 A man using a phone in a phone booth.
 ...

相关问题 更多 >

    热门问题