如何将字段从JSONLD读取到CSV?

2024-04-24 19:33:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试将json ld中的值提取到csv,因为它们在文件中。我面临着几个问题。 1在大多数情况下,为不同字段读取的值会被截断。在其余的情况下,其他字段的值出现在其他字段中。 2我也得到一个错误-“额外的数据”后,约4000行。 文件相当大(半gb)。我附上我的代码的一个缩短版本。请告诉我哪里出错了。你知道吗

输入文件-我已经缩短了它并保存在这里。没有办法把它放在这里。你知道吗

https://github.com/Architsi/json-ld-issue

我试着写这个脚本,我也试过多个在线转换器

import csv, sys, math, operator, re, os, json, ijson
from pprint import pprint

filelist = []

for file in os.listdir("."):
    if file.endswith(".json"):
        filelist.append(file)

for input in filelist:

    newCsv = []
    splitlist = input.split(".")
    output = splitlist[0] + '.csv'

    newFile = open(output, 'w', newline='') #wb for windows, else you'll see newlines added to csv

    # initialize csv writer
    writer = csv.writer(newFile)

    #Name of the columns
    header_row = ('Format', 'Description', 'Object', 'DataProvider')

    writer.writerow(header_row)

    with open(input, encoding="utf8") as json_file:

        data = ijson.items(json_file, 'item')

        #passing all the values through try except
        for s in data:

            source = s['_source']

            try:
                source_resource = source['sourceResource']
            except:
                print ("Warning: No source resource in record ID: " + id)

            try:
                data_provider = source['dataProvider'].encode()
            except:
                data_provider = "N/A"

            try:
                _object = source['object'].encode()
            except:
                _object = "N/A"

            try:
                descriptions = source_resource['description']
                string = ""
                for item in descriptions:
                    if len(descriptions) > 1:
                        description = item.encode() #+ " | "
                    else:
                        description = item.encode()
                    string = string + description
                description = string.encode()
            except:
                description = "N/A"


            created = ""
#writing it to csv
            write_tuple = ('format', description, _object, data_provider)

            writer.writerow(write_tuple)

    print ("File written to " + output)
    newFile.close()

我得到的错误是加薪通用.JSONError(“附加数据”) 预期结果是一个包含所有列和正确值的csv文件


Tags: 文件csvinjsonsourcefordataobject