用Python过滤子串后提取数据

2024-06-16 09:42:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个JSON文件的DNS流量与此格式

{
    "index": {
        "_type": "answer_query", 
        "_id": 0, 
        "_index": "index_name"
    }
}

{
    "answer_section": " ", 
    "query_type": "A", 
    "authority_section": "com. 172 IN SOA a.xxxx-xxxx.net. nstld.xxxx-xxxxcom. 1526440480 1800 900 604800 86400", 
    "record_code": "NXDOMAIN", 
    "ip_src": "xx.xx.xx.xx", 
    "response_ip": "xx.xx.xx.xx", 
    "date_time": "2018-05-16T00:57:20Z", 
    "checksum": "CORRECT", 
    "query_name": "xx.xxxx.com.", 
    "port_src": 50223, 
    "question_section": "xx.xxxx.com. IN A", 
    "answer_count_section": 0
}

我需要提取authority_section中空格后的数字(在本例中为172)小于300的数据,忽略那些不符合要求的数据,然后将输出写入另一个JSON文件。你知道吗

我怎样才能做到这一点?谢谢


Tags: 文件answernameinipsrccomjson
2条回答

假设stack1.txt是您发布的文件。这将写入一个新文件stack2.txt,如果“空格后的值”大于等于300,则该文件将省略“authority\u section”行。这个解决方案不需要解析json,但是它非常依赖于数据的格式是否一致。你知道吗

import os
with open('stack2.txt','w') as new_file:
    old_file = open('stack1.txt').readlines()
    delete_file = False
    for line in old_file:
        if not (line.strip().startswith('"authority_section"') and int(line.split(':')[1].split()[1]) >= 300):
            new_file.write(line)
        else:
            delete_file = True
if delete_file:
    os.remove('stack2.txt')

您可以尝试以下方法:

#!/usr/bin/python3
import json
import re

data = (
    """
    {
         "answer_section": " ",
         "query_type": "A",
         "authority_section": "com. 172 IN SOA a.xxxx-xxxx.net. nstld.xxxx-xxxxcom. 1526440480 1800 900 604800 86400",
         "record_code": "NXDOMAIN",
         "ip_src": "xx.xx.xx.xx",
         "response_ip": "xx.xx.xx.xx",
         "date_time": "2018-05-16T00:57:20Z",
         "checksum": "CORRECT",
         "query_name": "xx.xxxx.com.",
         "port_src": 50223,
         "question_section": "xx.xxxx.com. IN A",
         "answer_count_section": 0
    }
    """
)


json_data = json.loads(data)
print('BEFORE: ', json_data)

r = re.compile('^\s([1-2]\d\d|[1-9]\d|[1-9])\s$')


found = False
key_to_delete = None

for key, value in json_data.items():
    if value == 0:
        pass
    else:
        tmp = str(value)
        for i in range(0, len(tmp)):
            if r.match(tmp[i:i+3]):
                found = True
                key_to_delete = key
                print('FOUND 1: ', value)
            elif r.match(tmp[i:i+4]):
                found = True
                key_to_delete = key
                print('FOUND 2: ', value)
            elif r.match(tmp[i:i+5]):
                found = True
                key_to_delete = key
                print('FOUND 3: ', value)

if found:
    json_data.pop(key_to_delete)

print('RESULT: ', json_data)

我的答案中使用了正则表达式。阅读关于regex的更多细节。你知道吗

相关问题 更多 >