处理无效的Json

2024-03-28 10:39:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我得到了一个格式错误的json,因为键“text”可能会有用户的评论,所以我需要用json(双引号)修复这个问题

{"test":[{"id":"1234","user":{"id":"1234"},"text":"test, "." test " 1234"","created":"2019-01-09"}]}

从另一个线程尝试下面,但无法使其工作。你知道吗

import json, re

while True:
    try:
        result = json.loads(test.json)   # try to parse...
        break                    # parsing worked -> exit loop
    except Exception as e:
        # "Expecting , delimiter: line 34 column 54 (char 1158)"
        # position of unexpected character after '"'
        unexp = int(re.findall(r'\(char (\d+)\)', str(e))[0])
        # position of unescaped '"' before that
        unesc = s.rfind(r'"', 0, unexp)
        s = s[:unesc] + r'\"' + s[unesc+1:]
        # position of correspondig closing '"' (+2 for inserted '\')
        closg = s.find(r'"', unesc + 2)
        s = s[:closg] + r'\"' + s[closg+1:]
print result

Traceback (most recent call last):
  File "test.py", line 10, in <module>
    unexp = int(re.findall(r'\(char (\d+)\)', str(e))[0])
IndexError: list index out of range

预期结果:(检查文本:带转义双引号的键数据)

或者,我们可以删除“text”:“created”之前的所有双引号,然后将该值括在“text”:键中,并使用“start&end”来解决我的问题

{"test":[{"id":"1234","user":{"id":"1234"},"text":"test \".\" test \" 1234\"","created":"2019-01-09"}]}

或者

{"test":[{"id":"1234","user":{"id":"1234"},"text":"test . test 1234","created":"2019-01-09"}]}

Tags: oftexttestreidjsonpositiontry
1条回答
网友
1楼 · 发布于 2024-03-28 10:39:48

您只需要编辑这一行,这样就可以使用regex来匹配它,编辑值,然后将它与json字符串的其余部分连接起来,以便对其进行解析

import re
import json

json_str = '''{
  "test": [
    {
      "id": "1234",
      "user": {
        "id": "1234"
      },
      "text": "test "." test " 1234"",
      "created": "2019-01-09"
    }
  ]
}'''

lines = []
# match the text key
text_line = re.compile('^\s+\"text\"')

for line in json_str.split('\n'):
    # if a match happens, this will execute and fix the "text" line
    if re.match(text_line, line):
        k, v = line.split(':')
        # the slice here is so that I don't escape the wrapping
        # double quotes, which are the first and last chars of v
        v = '"%s",' %  v.strip()[1:-1].replace('"', '\\"')
        line = '%s: %s' % (k, v)
    # otherwise, carry on
    lines.append(line)

print('\n'.join(lines))

{
  "test": [
    {
      "id": "1234",
      "user": {
        "id": "1234"
      },
      "text": "test \".\" test \" 1234\"\"",
      "created": "2019-01-09"
    }
  ]
}

# Now you can parse it with json.loads
json.loads('\n'.join(lines))

{'test': [{'id': '1234', 'user': {'id': '1234'}, 'text': 'test "." test " 1234""', 'created': '2019-01-09'}]}

编辑:OP已经指出json是单行的

可以进行一些优化,但是您可以使用re找到json中的所有键,然后使用与以前类似的方式对其进行解析:

import re
import json

# Now all one line
s = '''{"test":[{"id":"1234","user":{"id":"1234"},"text":"test, "." test " 1234"","created":"2019-01-09"}]}'''

# find our keys which will serve as our placeholders
keys = re.findall('\"\w+\"\:', s))

# ['"test":', '"id":', '"user":', '"id":', '"text":', '"created":']

# now we can find the indices for those keys to mark start
# and finish locations to extract the value
start, finish = s.index(keys[-2]), s.index(keys[-1])

k, v = s[start:finish].split(':')
# replace v as before
v = '"%s",' %  v.strip()[1:-1].replace('"', '\\"')
# '"test, \\".\\" test \\" 1234\\"\\"",'

# replace string since it's immutable
s = s[:start] + '%s: %s' % (k, v) + s[finish:]

json.loads(s)
# {'test': [{'id': '1234', 'user': {'id': '1234'}, 'text': 'test, "." test " 1234""', 'created': '2019-01-09'}]}

值得注意的是,这对于这个特定的用例是有效的,我可以尝试制定一个更通用的方法,但这至少会让您起步

相关问题 更多 >