如何使用Python解析JSON行文件中的特定唯一值并将其存储到数组中

2024-04-24 15:06:42 发布

您现在位置:Python中文网/ 问答频道 /正文

程序需要通过JSON行文件进行解析,并将数据存储到数组中。 实际需要存储在数组中的唯一数据是“SRC/Word1”后面的任何值

以下是JSON行文件的示例:

{"Event UTC": "2020-12-21 05:23:06", "Event Time": "00:23:06:94", "SRC/Word1": " ", "Word2": " ", "Word3": " "}
{"Event UTC": "2020-12-21 05:30:53", "Event Time": "00:30:53:95", "SRC/Word1": "E1F25701", "Word2": "A29C7E68", "Word3": " "}
{"Event UTC": "2020-12-21 05:31:04", "Event Time": "00:31:04:34", "SRC/Word1": "E1F25701", "Word2": "D529F3D7", "Word3": " "}
{"Event UTC": "2020-12-21 10:18:54", "Event Time": "05:18:54:45", "SRC/Word1": "E15511D7", "Word2": "1F6FC55C", "Word3": " "}

以下是我目前掌握的代码:

import json

data = []
with open('stela_zerrl_t01_201222_084053_test.json') as fin:
    for line in fin:
        data.append(json.loads(line))
        print(data)

数据数组将包含类似于data=[E1F25701,E15511D7]的内容

你知道如何做到这一点吗


2条回答

请参见下文(data表示从文件加载的行)

data = [{"Event UTC": "2020-12-21 05:23:06", "Event Time": "00:23:06:94", "SRC/Word1": " ", "Word2": " ", "Word3": " "},
        {"Event UTC": "2020-12-21 05:30:53", "Event Time": "00:30:53:95", "SRC/Word1": "E1F25701", "Word2": "A29C7E68",
         "Word3": " "},
        {"Event UTC": "2020-12-21 05:31:04", "Event Time": "00:31:04:34", "SRC/Word1": "E1F25701", "Word2": "D529F3D7",
         "Word3": " "},
        {"Event UTC": "2020-12-21 10:18:54", "Event Time": "05:18:54:45", "SRC/Word1": "E15511D7", "Word2": "1F6FC55C",
         "Word3": " "}]
data_sub_set = list(set(x["SRC/Word1"] for x in data if x["SRC/Word1"].strip()))
print(data_sub_set)

输出

['E1F25701', 'E15511D7']

JSON对象只需要像字典一样进行访问。如果您正在查找SRC/Word1字段,那么您需要:

import json

data = []
with open('stela_zerrl_t01_201222_084053_test.json') as fin:
    for line in fin:
        data.append(json.loads(line)['SRC/Word1']) # not field access here
        print(data)

但是,如果json不总是有空字符串字段,您可能希望省略空字符串条目或执行一些错误处理

编辑:刚刚看到你的“跳过重复项并忽略空项”评论

import json

data = []
with open('stela_zerrl_t01_201222_084053_test.json') as fin:
    for line in fin:
        value = json.loads(line).get('SRC/Word1', '')
        # check not all spaces and also not already present in array
        if not value.isspace() and value not in data:
            data.append(value)
            print(data)

相关问题 更多 >