若键不在JSON中,则将值设置为null并插入到数据帧中

2024-06-17 07:47:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从JSON字符串中的键中获取一个值,然后将该数据插入到数据帧中。
但有时该列可能出现在结构中,也可能不出现,因此如果键不存在,则会出现类似'KeyError': 'Company_Name_Abb'的错误

因此,如果我得到这个错误,数据将不会被插入到dataframe,并将pass插入到下一个协定,但这不是我想要的。如果某个键不存在,我想给它设置一个Null值,这样它就可以正常插入到数据帧中

因此,如果您注意到下面的JSON字符串,我没有得到公司名称\u Abb

{
  "success": "true",
  "code": 200,
  "messages": null,
  "response": {
    "templateType": "",
    "metadata": {
      "AUTO_RENEWAL_PERIOD_MONTHS": "0",
      "Reminder_date": "",
      "Monetary_Type": "",
      "Legal_Entity": "Test Inc.",
      "BIZ_DEV_CONTACT": "Goofy",
      "Business_Owner": "Minnie Mouse",
      "Contract_Status": "Active",
      "Cond_exp_terms": "",
      "Contract_Category": "General",
      "Contract_Term_Period_UoM": "Years",
      "Customer_ID_Intel": "",
      "Expiry_Date_words": "June 08, 2024",
      "Royalty_Max_Value": "",
      "TIMEZONE": "US/Pacific - (GMT-8:00)",
      "Audit_Section_Num": "",
      "Same_as_Contract_Signed_Date": "Enter Date",
      "CNDA_Number": "12345",
      "Contract_Title": "'TEST AGREEMENT FOR API No 2",
      "ConfidentialContract": "No",
      "Heirarchy_Status": "Master Agreement",
      "RSNDA_Number": "",
      "Contract_Signed_On_date": 1623263400000,
      "Termination_Section_Num": "",
      "Region": "N.A.",
      "Contract_Expiration_Date": 1717785000000,
      "Historical_Audit_Date": "",
      "RSNDA_Date": "",
      "PCM_Contact": "",
      "License_Technology": "",
      "PCM_Obligation": "No",
      "MILES_SUMM": "",
      "Effective_Date_words": "June 09, 2021",
      "Amendments": [],
      "Contract_Effective_Date": 1623177000000,
      "Folder": "Default",
      "RUNDA_Date": "",
      "Contract_Owner": "test@test.com",
      "Licensed_Tech_Name": "",
      "Payment_Terms": "",
      "SUBLICENSE": "Test",
      "AUTO_RENEW": "Fixed",
      "ATTORNEY_NOTES": "",
      "Vendor_ID_Intel": "",
      "Attorney_Owner": "",
      "Monetary_Num": "",
      "RUNDA_Number": "",
      "Notes": "",
      "Contract_Desc": "This is a test",
      "ContractualDoc": [],
      "Contract_Renewal_Date": "",
      "Monetary_Sec_Num": "",
      "SearchableContract": "Yes",
      "AUTO_RENEWAL_MAXIMUM_NUMBER_OF_RENEWALS": "0",
      "AUTO_RENEWAL_NOTICE_DAYS": "0",
      "Intel_Product_Family": "",
      "CNDA_Date": 1623177000000,
      "DEV_PARTNER": "",
      "Contract_Term": "3"
    },
    "contractNumber": "INTEL29017",
    "supportingDocuments": [],
    "statusCode": "",
  },
  "responseMap": null
}

In the example below I just created two variables, but I'm getting all of the keys from the JSON. And also the JSON above is just one example, since the Foor Loop retrieves many JSON string as the one above.

如何获取值(JSON.text,我正在从API检索)

for element in jsoncontracts['response']['rows']:
    try:  
        jsoncontractsdetails = json.loads(JSON.text)
        
        Auto_Renewal_Period_Months, Company_Name_Abb = jsoncontractsdetails['response']['metadata']['AUTO_RENEWAL_PERIOD_MONTHS'], jsoncontractsdetails['response']['metadata']['Company_Name_Abb']

        df_contractnumberdetails = pd.DataFrame([[Auto_Renewal_Period_Months, Company_Name_Abb]], columns=['AUTO_RENEWAL_PERIOD_MONTHS', 'Company_Name_Abb'], index=None)

    except Exception as e:
        print('ContractNumberError: ' + contract)
        print(e)            
        pass

我已经尝试过排除KeyError,并且能够为它的键分配一个值,但是我无法插入数据帧,因为它正在传递到循环中的下一个协定

更新:

我试图从collections模块中使用defaultdict函数,但在尝试从该函数访问密钥后,仍然得到KeyError: 'Company_Name_Abb'

jsoncontractsdetails = defaultdict(lambda: -1, jsoncontractsdetails)
Company_Name_Abb = jsoncontractsdetails['response']['metadata']['Company_Name_Abb']

Tags: the数据namejsonautodateresponsenum
2条回答

pandas可以处理字典,您可以将行馈送到pd.DataFrame,然后只选择所需的列:

df = pd.DataFrame([m['response']['metadata'] for m in jsoncontracts['response']['rows']])

如果要事先选择列,请执行以下操作:

df = pd.DataFrame(
    [
        {
           k:v for k,v in m['response']['metadata'].items() 
          if k in ('AUTO_RENEWAL_PERIOD_MONTHS', 'Company_Name_Abb')
        } 
        for m in jsoncontracts['response']['rows']
    ]
)

有几种方法可以做到这一点

  • 首先,更奇特的是使用schema库并定义您期望的JSON对象模式。您可以设置必填字段和可选字段,可能设置默认值
  • 第二,更奇妙的是,使用dataclassesdataclasses-json库进行更方便的访问。从长远来看,我认为这种方式是最可取的,因为您可以使用非常方便的数据结构
  • 最后,最快的方法是使用get()(用于获取值)或setdefault()(用于为可能不存在的嵌套字典设置值)dict方法。例如,您可以通过jsoncontractsdetails.get('response', {}).get('metadata', {}).get('AUTO_RENEWAL_PERIOD_MONTHS', None)访问其中一个字段。这种方法很快就会变得杂乱无章,但一次性或少量使用效果很好:)

相关问题 更多 >