使用python ijson读取包含多个json对象的大型json文件

{"name":"accelerator_pedal_position","value":0,"timestamp":1364323939.012000} {"name":"engine_speed","value":772,"timestamp":1364323939.027000} {"name":"vehicle_speed","value":0,"timestamp":1364323939.029000} {"name":"accelerator_pedal_position","value":0,"timestamp":1364323939.035000}

2条回答

网友

1楼 · 编辑于 2024-05-15 04:50:07

由于提供的块看起来更像一组行，每个行组成一个独立的JSON，因此应该相应地对其进行分析：

# each JSON is small, there's no need in iterative processing
import json 
with open(filename, 'r') as f:
    for line in f:
        data = json.loads(line)
        # data[u'name'], data[u'engine_speed'], data[u'timestamp'] now
        # contain correspoding values

网友

2楼 · 编辑于 2024-05-15 04:50:07

不幸的是，ijson库（截至2018年3月的v2.3）不处理解析多个JSON对象。它只能处理一个整体对象，如果您试图解析第二个对象，将得到一个错误："ijson.common.JSONError: Additional data"。请参阅此处的错误报告：

这是一个很大的限制。但是，只要在每个JSON对象后面都有换行符（新行字符），就可以独立地逐行分析每个，如下所示：

import io
import ijson

with open(filename, encoding="UTF-8") as json_file:
    cursor = 0
    for line_number, line in enumerate(json_file):
        print ("Processing line", line_number + 1,"at cursor index:", cursor)
        line_as_file = io.StringIO(line)
        # Use a new parser for each line
        json_parser = ijson.parse(line_as_file)
        for prefix, type, value in json_parser:
            print ("prefix=",prefix, "type=",type, "value=",value)
        cursor += len(line)

您仍然在对文件进行流式处理，并且没有将其完全加载到内存中，因此它可以处理大型JSON文件。它还使用来自：How to jump to a particular line in a huge text file?的行流技术，并使用来自：Accessing the index in 'for' loops?的enumerate()

相关问题更多 >

编程相关推荐

热门问题

热门文章