TypeError:dict_值类型的对象不可JSON序列化

2024-04-28 21:31:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我一直在使用Python和Requests模块,以按位置获取所有Irelands新冠病毒数据。 我遇到了一个问题,就是我从API调用获取数据的dict无法转换为JSON,我想在数据帧上显示JSON

现在,不幸的是API的网站是有限的,我必须对每个位置使用for循环来获取该区域的数据。然后,当我这样做时,我把它推到字典thisdict[“该位置的名称”]

然后我尝试将它创建的所有25个dict转换为JSON

toJSON = json.dumps(thisdict.values())
data = json.loads(toJSON)

这就是我得到错误的地方 但是,如果我使用我创建的其中一个位置的位置,它将起作用,但我需要所有位置。这可能吗

toJSON = json.dumps(thisdict["Dublin"])
data = json.loads(toJSON)

我试过了

toJSON = json.dumps(*thisdict)
data = json.loads(toJSON)

toJSON = json.dumps(list(thisdict.values())
data = json.loads(toJSON)

我在这里找到的 https://markhneedham.com/blog/2017/03/19/python-3-typeerror-object-type-dict_values-not-json-serializable/

所有代码都在这个链接中 https://replit.com/@MrGallen/GetC19ApiToCSVEveryCounty#main.py

 # to handle  data retrieval
import requests
# to manage json data
import json
# for pandas dataframes
import pandas as pd

counties = ["Carlow", "Cavan", "Clare", "Cork", "Donegal", "Dublin", "Galway", "Kerry", "Kildare", "Kilkenny", "Laois", "Leitrim", "Limerick", "Longford", "Louth", "Mayo", "Meath", "Monaghan", "Offaly", "Roscommon", "Sligo", "Tipperary", "Waterford", "Westmeath", "Wexford", "Wicklow"]
thisdict = {}
for county in counties:
  url = "https://services1.arcgis.com/eNO7HHeQ3rUcBllm/arcgis/rest/services/Covid19CountyStatisticsHPSCIreland/FeatureServer/0/query?where=CountyName%20%3D%20'"+county+"'&outFields=CountyName,PopulationCensus16,TimeStamp,ConfirmedCovidCases,PopulationProportionCovidCases,ConfirmedCovidDeaths,ConfirmedCovidRecovered&returnGeometry=false&outSR=4326&f=json"
  r = requests.get(url, stream=True)
  #info = r.headers
  #print(info)
  r = r.json()
  r = r["features"]
  thisdict[county] = r

# decode json data into a dict object
toJSON = json.dumps(thisdict)
data = json.loads(toJSON)
# in this dataset, the data to extract is under 'features'
with open("sample.json", "w") as outfile: 
    json.dump(data, outfile)

df = pd.json_normalize(data)
print(df.head(10))

# Select a number of columns - all rows
CD = df[['attributes.CountyName', 'attributes.TimeStamp', 'attributes.ConfirmedCovidCases']]

print(CD) # DataFrame

Tags: to数据httpsimportcomjsonfordata
1条回答
网友
1楼 · 发布于 2024-04-28 21:31:32

这里真正的问题是你开始写口述的一般格式。当您只需要一个县属性的大列表时,您最终会不必要地嵌套多个列表

response.json()["features"]是一个属性列表。例如,Carlow返回该县407个属性的列表。因此,在thisdict中,您将得到一个县键的dict,其值都是属性列表

然后,尝试获取该dict的dict_values,这(如果将dict_值转换为列表)将生成一个不必要的属性列表列表

就JSON的(反)序列化而言,这实际上是可行的,例如,这不会引发序列化异常:

toJSON = json.dumps(list(thisdict.values()))
data = json.loads(toJSON)

但是,稍后当您尝试将此DICT列表传递给pandas.json_normalize()时,您将遇到问题:

Traceback (most recent call last):
  File "/home/dephekt/pandas/main.py", line 25, in <module>
    df = pd.json_normalize(data)
  File "/home/dephekt/pandas/.venv/lib/python3.8/site-packages/pandas/io/json/_normalize.py", line 270, in _json_normalize
    if any([isinstance(x, dict) for x in y.values()] for y in data):
  File "/home/dephekt/pandas/.venv/lib/python3.8/site-packages/pandas/io/json/_normalize.py", line 270, in <genexpr>
    if any([isinstance(x, dict) for x in y.values()] for y in data):
AttributeError: 'list' object has no attribute 'values'

您可以在回溯中看到,它期望数据是一个dict列表或dict。它需要for y in data,然后尝试调用y.values(),假设y是一个字典,但在这种情况下,它是一个嵌套在列表中的列表,并且列表没有values()方法,因此它抛出了AttributeError

考虑下面的代码:

import json
import logging

import pandas as pd

import requests

logging.basicConfig(level=logging.DEBUG)

counties = [
    "Carlow",
    "Cavan",
    "Clare",
    "Cork",
    "Donegal",
    "Dublin",
    "Galway",
    "Kerry",
    "Kildare",
    "Kilkenny",
    "Laois",
    "Leitrim",
    "Limerick",
    "Longford",
    "Louth",
    "Mayo",
    "Meath",
    "Monaghan",
    "Offaly",
    "Roscommon",
    "Sligo",
    "Tipperary",
    "Waterford",
    "Westmeath",
    "Wexford",
    "Wicklow",
]

results = []
s = requests.Session()

for county in counties:
    url = f"https://services1.arcgis.com/eNO7HHeQ3rUcBllm/arcgis/rest/services/Covid19CountyStatisticsHPSCIreland/FeatureServer/0/query"
    fields = "CountyName,PopulationCensus16,TimeStamp,ConfirmedCovidCases,PopulationProportionCovidCases,ConfirmedCovidDeaths,ConfirmedCovidRecovered"
    params = {
        "where": f"CountyName='{county}'",
        "outFields": fields,
        "returnGeometry": False,
        "outSR": 4326,
        "f": "json",
    }
    response = s.get(url, params=params, timeout=10)
    response.raise_for_status()

    features = response.json().get("features")

    for feature in features:
        results.append(feature)

with open("sample.json", "w") as outfile:
    outfile.write(json.dumps(results))

df = pd.json_normalize(results)
print(df.head(10))

CD = df[["attributes.CountyName", "attributes.TimeStamp", "attributes.ConfirmedCovidCases"]]

print(CD)

我在这里做的是创建一个requests会话(详见下文),然后针对每个县的响应,我们从features键获取属性列表,然后迭代该列表,并将各个属性放入结果列表中

你最终得到的是一个大列表,其中列出了所有县的所有属性

然后,我们避免一些不必要的体操,你在那里做json.dumps()只是为了json.loads()把它恢复到你已经拥有的格式,例如:

toJSON = json.dumps(thisdict)
data = json.loads(toJSON)

在这里,data最终与thisdict完全相同,没有理由将其反序列化回dict。您只需转储结果即可将其写入文件。对于pandas参数,您可以只传递thisdict(或者在我的示例中,results,它是一个列表)

所有这些都以以下内容作为输出:

  attributes.CountyName  ...  attributes.ConfirmedCovidRecovered
0                Carlow  ...                                None
1                Carlow  ...                                None
2                Carlow  ...                                None
3                Carlow  ...                                None
4                Carlow  ...                                None
5                Carlow  ...                                None
6                Carlow  ...                                None
7                Carlow  ...                                None
8                Carlow  ...                                None
9                Carlow  ...                                None

[10 rows x 7 columns]

      attributes.CountyName  ...  attributes.ConfirmedCovidCases
0                    Carlow  ...                               0
1                    Carlow  ...                               0
2                    Carlow  ...                               0
3                    Carlow  ...                               0
4                    Carlow  ...                               0
...                     ...  ...                             ...
10577               Wicklow  ...                            4460
10578               Wicklow  ...                            4473
10579               Wicklow  ...                            4483
10580               Wicklow  ...                            4491
10581               Wicklow  ...                            4497

[10582 rows x 3 columns]

如果不完全是您想要的格式,您可以在将其发送到pandas数据帧之前对其进行更多操作,但希望这能为您提供一个很好的示例

由于在循环中生成了大量请求,因此我创建了一个requests.Session()对象,因为它利用了urllib3的连接池,并为每个GET请求重用相同的底层TCP连接,这具有更好的性能。这样,您就不会为每次循环执行重新协商新的TCP连接。它建立一个连接并通过该连接发送所有26个请求,而不是26个TCP连接,每个TCP连接发送一个微小的请求

我还将您的请求查询参数与实际URL端点分开,这只是为了我自己的理智。您可以按以前的方式创建URL,但在需要更改这些参数的值时,这确实给了您更多的灵活性。它将与您的长URL一起工作,我只是觉得它很难处理

相关问题 更多 >