将unicode字符串中的JSON解析为字典

input = u'[{ attributes: { NAME: "Name_1ĂĂÎÎ", TYPE: "Tip1", LOC_JUD: "Bucharest", LAT_LON: "234343/432545", S70: "2342345", MAP: "Map_one", SCH: "1:5000, SURSA: "PPP" } }, { attributes: { NAME: "NAME_2șțț", TYPE: "Tip2", LOC_JUD: "cea", LAT_LON: "123/54645", S70: "4324", MAP: "Map_two", SCH: "1:578000", SURSA: "PPP" } } ] '

2条回答

网友

1楼 · 编辑于 2024-05-14 03:16:49

请尝试一个简化的示例：

s = '[{attributes: { a: "foo", b: "bar" } }]'

主要问题是字符串不是有效的JSON格式：

>>> json.loads(s)
[...]
JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 3 (char 2)

如果输入是由您生成的，则修复它。如果它来自其他地方，则需要在用json模块加载它之前对其进行编辑。你知道吗

请注意，.load()方法是如何使用正确的JSON的：

>>> s = '[{"attributes": { "a": "foo", "b": "bar" } }]'
>>> json.loads(s)
[{'attributes': {'a': 'foo', 'b': 'bar'}}]
>>> type(json.loads(s))
list

网友

2楼 · 编辑于 2024-05-14 03:16:49

正如其他人提到的，您的输入数据不是JSON。理想情况下，应该将其固定在上游，以便获得有效的JSON。你知道吗

但是，如果这超出了您的控制范围，您可以将该数据转换为JSON。你知道吗

主要的问题是那些没有引号的键。我们可以通过使用正则表达式在每行的第一个字段中搜索有效的名称来解决这个问题。如果找到一个有效的名字，我们就用双引号把它括起来。你知道吗

import json
import re

source = u'''[{
        attributes: {
            NAME: "Name_1ĂĂÎÎ",
            TYPE: "Tip1",
            LOC_JUD: "Bucharest",
            LAT_LON: "234343/432545",
            S70: "2342345",
            MAP: "Map_one",
            SCH: "1:5000",
            SURSA: "PPP"
        }
    }, {
        attributes: {
            NAME: "NAME_2șțț",
            TYPE: "Tip2",
            LOC_JUD: "cea",
            LAT_LON: "123/54645",
            S70: "4324",
            MAP: "Map_two",
            SCH: "1:578000",
            SURSA: "PPP"
        }
    }
]
'''

# Split source into lines, then split lines into colon-separated fields
a = [s.strip().split(': ') for s in source.splitlines()]

# Wrap names in first field in double quotes
valid_name = re.compile('(^\w+$)')
for row in a:
    row[0] = valid_name.sub(r'"\1"', row[0])

# Recombine the data and load it
data = json.loads(' '.join([': '.join(row) for row in a]))

# Test 

print data[0]["attributes"]
print '- ' * 30
print json.dumps(data, indent=4, ensure_ascii=False)

输出

{u'LOC_JUD': u'Bucharest', u'NAME': u'Name_1\u0102\u0102\xce\xce', u'MAP': u'Map_one', u'SURSA': u'PPP', u'S70': u'2342345', u'TYPE': u'Tip1', u'LAT_LON': u'234343/432545', u'SCH': u'1:5000'}
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
[
    {
        "attributes": {
            "LOC_JUD": "Bucharest", 
            "NAME": "Name_1ĂĂÎÎ", 
            "MAP": "Map_one", 
            "SURSA": "PPP", 
            "S70": "2342345", 
            "TYPE": "Tip1", 
            "LAT_LON": "234343/432545", 
            "SCH": "1:5000"
        }
    }, 
    {
        "attributes": {
            "LOC_JUD": "cea", 
            "NAME": "NAME_2șțț", 
            "MAP": "Map_two", 
            "SURSA": "PPP", 
            "S70": "4324", 
            "TYPE": "Tip2", 
            "LAT_LON": "123/54645", 
            "SCH": "1:578000"
        }
    }
]

注意，这个代码有点脆弱。它可以处理问题中所示格式的数据，但是如果一行中有多个键值对，它就不起作用了。你知道吗

如前所述，解决这个问题的最佳方法是在上游，在那里生成非JSON。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章