SimpleJson处理同名实体的方法
我在应用引擎中使用Alchemy API,所以我用simplejson库来解析返回的数据。问题是,这些返回的数据里有一些条目的名字是一样的。
{
"status": "OK",
"usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html",
"url": "",
"language": "english",
"entities": [
{
"type": "Person",
"relevance": "0.33",
"count": "1",
"text": "Michael Jordan",
"disambiguated": {
"name": "Michael Jordan",
"subType": "Athlete",
"subType": "AwardWinner",
"subType": "BasketballPlayer",
"subType": "HallOfFameInductee",
"subType": "OlympicAthlete",
"subType": "SportsLeagueAwardWinner",
"subType": "FilmActor",
"subType": "TVActor",
"dbpedia": "http://dbpedia.org/resource/Michael_Jordan",
"freebase": "http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000000029161",
"umbel": "http://umbel.org/umbel/ne/wikipedia/Michael_Jordan",
"opencyc": "http://sw.opencyc.org/concept/Mx4rvViVq5wpEbGdrcN5Y29ycA",
"yago": "http://mpii.de/yago/resource/Michael_Jordan"
}
}
]
}
所以问题就是“subType”这个字段重复了,导致我用a加载的数据变成了“TVActor”,而不是一个列表。有没有办法解决这个问题呢?
2 个回答
1
RFC 4627文档提到的application/json
媒体类型建议在对象中使用唯一的键,但并没有明确禁止重复键:
对象中的名称应该是唯一的。
根据RFC 2119的定义:
应该这个词,或者形容词“推荐”,意味着在某些特定情况下,可能有合理的理由忽略某个特定项,但在选择不同的做法之前,必须理解其全部含义并仔细权衡。
这是一个已知的问题。
你可以通过修改重复的键,或者把它们保存到数组中来解决这个问题。如果你需要,可以使用以下代码。
import json
def parse_object_pairs(pairs):
"""
This function get list of tuple's
and check if have duplicate keys.
if have then return the pairs list itself.
but if haven't return dict that contain pairs.
>>> parse_object_pairs([("color": "red"), ("size": 3)])
{"color": "red", "size": 3}
>>> parse_object_pairs([("color": "red"), ("size": 3), ("color": "blue")])
[("color": "red"), ("size": 3), ("color": "blue")]
:param pairs: list of tuples.
:return dict or list that contain pairs.
"""
dict_without_duplicate = dict()
for k, v in pairs:
if k in dict_without_duplicate:
return pairs
else:
dict_without_duplicate[k] = v
return dict_without_duplicate
decoder = json.JSONDecoder(object_pairs_hook=parse_object_pairs)
str_json_can_be_with_duplicate_keys = '{"color": "red", "size": 3, "color": "red"}'
data_after_decode = decoder.decode(str_json_can_be_with_duplicate_keys)
6
定义 application/json
的 rfc 4627 里说:
An object is an unordered collection of zero or more name/value pairs
还有:
The names within an object SHOULD be unique.
这意味着 AlchemyAPI 不应该在同一个对象里返回多个 "subType"
名称,并且声称这是一种 JSON 格式。
你可以尝试请求相同的数据,使用 XML 格式(outputMode=xml
),这样可以避免结果的模糊性,或者把重复的键值转换成列表:
import simplejson as json
from collections import defaultdict
def multidict(ordered_pairs):
"""Convert duplicate keys values to lists."""
# read all values into lists
d = defaultdict(list)
for k, v in ordered_pairs:
d[k].append(v)
# unpack lists that have only 1 item
for k, v in d.items():
if len(v) == 1:
d[k] = v[0]
return dict(d)
print json.JSONDecoder(object_pairs_hook=multidict).decode(text)
示例
text = """{
"type": "Person",
"subType": "Athlete",
"subType": "AwardWinner"
}"""
输出
{u'subType': [u'Athlete', u'AwardWinner'], u'type': u'Person'}