有没有办法将特定的JSON数据转换为CSV?

2024-04-28 20:46:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我有JSON格式,看起来像

这是链接https://drive.google.com/file/d/1RqU2s0dqjd60dcYlxEJ8vnw9_z2fWixd/view?usp=sharing

result =

{
      "ERROR":[
         
      ],
      "LinkSetDbHistory":[
         
      ],
      "LinkSetDb":[
         {
            "Link":[
               {
                  "Id":"8116078"
               },
               {
                  "Id":"7654180"
               },
               {
                  "Id":"7643601"
               },
               {
                  "Id":"7017037"
               },
               {
                  "Id":"6190213"
               },
               {
                  "Id":"5902265"
               },
               {
                  "Id":"5441934"
               },
               {
                  "Id":"5417587"
               },
               {
                  "Id":"5370323"
               },
               {
                  "Id":"5362514"
               },
               {
                  "Id":"4818642"
               },
               {
                  "Id":"4330602"
               }
            ],
            "DbTo":"pmc",
            "LinkName":"pubmed_pmc_refs"
         }
      ],
      "DbFrom":"pubmed",
      "IdList":[
         "25209241"
      ]
   },




{
      "ERROR":[
   ],
  "LinkSetDbHistory":[
     
  ],
  "LinkSetDb":[
     {
        "Link":[
           {
              "Id":"7874507"
           },
           {
              "Id":"7378719"
           },
           {
              "Id":"6719480"
           },
           {
              "Id":"5952809"
           },
           {
              "Id":"4944516"
           }
        ],
        "DbTo":"pmc",
        "LinkName":"pubmed_pmc_refs"
     }
  ],
  "DbFrom":"pubmed",
  "IdList":[
     "25209630"
  ]

}

我想获取长度为12和列表的ID

"IdList":"25209241"

因此,最终输出将是

IDList: length

25209241: 12 (Total number of Id in link array)
25209630 : 5 (Total number of Id in link array)

我尝试过这段代码,但没有使用单个或多个值

pmc_ids = [link["Id"] for link in results["LinkSetDb"]["Link"]]
len(pmc_ids)

如果存在,它如何处理大型数据集


Tags: inidlinkerrortotalrefspmcpubmed
2条回答

“链接”键位于列表中。因此,将pmc_ids = [link["Id"] for link in results["LinkSetDb"]["Link"]]更改为pmc_ids = [link["Id"] for link in results["LinkSetDb"][0]["Link"]]

要生成csv文件,代码如下:

import json
import csv

with open('Citation_with_ID.json', 'r') as f_json:
    json_data = f_json.read()
f_json.close()

json_dict = json.loads(json_data)

csv_headers = ["IdList", "length"]
csv_values = []
for i in json_dict:
    if len(i["LinkSetDb"])>0:
        pmc_ids = [link["Id"] for link in i["LinkSetDb"][0]["Link"]]
    else:
        pmc_ids = []
    length = len(pmc_ids)
    if len(i['IdList'])==1:
        IdList = i['IdList'][0]
    else:
        IdList = None
    csv_values.append([IdList,length])

with open('mycsvfile.csv', 'w') as f_csv:
    w = csv.writer(f_csv)
    w.writerow(csv_headers)
    w.writerows(csv_values)
f_csv.close()

如果要将值存储在字典中,则可以使用类似的方法:

values_list = list(zip(*csv_values))
dict(zip(values_list[0],values_list[1]))

您将"LinkSetDb"作为一个包含单个词典的列表,但您将它作为一个词典进行索引。使用:

pmc_ids = [link["Id"] for link in result["LinkSetDb"][0]["Link"]]
len(pmc_ids)

相关问题 更多 >