如何在python中解析一行并将其拆分以保存在字典中

2024-06-01 05:10:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个文件,每行都包含字符串:

Format = 1A Rnti = 65535 (SI-RNTI) Format0/Format1A Differentiation Flag = 1 Localised / Distributed VRB Assignment Flag = 0 Resource Block Assignment = 0x00000000 Resource Blocks Detail = (RBstart:0, Lcrbs:1, Ndlrb:50) Modulation and Coding Scheme = 5 Harq Process Number = 16 (Broadcast HARQ Process) New Data Indicator = 0 Redundancy Version = 0 TPC Command = 0 (-1 dB)

我想选择字段名作为键并用“=”分隔以保存相应的值。字段名是 格式、Rnti、Format0/Format1A区分标志、本地化/分布式等。 我尝试如下:

^{pr2}$

但是我不能正确地分割字段名(key)和值。是有没有一种方法可以定义所有的键名,然后将字符串中的相应值存储在字典中?在


Tags: 文件字符串formatprocessresource字段名flagassignment
2条回答

在解析这个怪物时(在你找到它的作者之后),最好的办法是预先识别所有的字段名,然后捕捉这些字段之间的值。在

最简单的方法是找到行中的每个字段,然后提取它后面的等号和下一个找到的字段之间的值。比如:

# List all fields here (if possible in order of appearance)
# Everything not listed will end up as a part of another detected field's value
FIELD_LIST = ["Format", "Rnti", "Format0/Format1A Differentiation Flag",
              "Localised / Distributed VRB Assignment Flag", "Resource Block Assignment",
              "Resource Blocks Detail", "Modulation and Coding Scheme",
              "Harq Process Number", "New Data Indicator", "Redundancy Version",
              "TPC Command"]

# lets separate a logic to parse our ugly log in a function
def parse_ugly_log_line(log):
    field_indexes = {field: log.find(field) for field in FIELD_LIST}  # get field indexes
    field_order = sorted(field_indexes, key=field_indexes.get)  # sort indexes
    parsed_fields = {}  # store for our fields
    for i, field in enumerate(field_order):
        if field_indexes[field] == -1:  # field not found, skip
            continue
        field_start = log.find("=", field_indexes[field])  # value begins after `=`
        if field_start == -1:  # cannot find the field value, skip
            continue
        # field value ends where the next field begins:
        field_end = field_indexes[field_order[i + 1]] if i < len(field_order) - 1 else None
        if field_end and field_start > field_end:  # overlapping field value, skip
            continue
        parsed_fields[field] = log[field_start + 1:field_end].strip()  # extract the value
    return parsed_fields

# lets now open our log file and parse it line by line:
logs = []  # storage of the parsed data
with open("your_log.txt", "r") as f:
    for line in f:
        logs.append(parse_ugly_log_line(line))

# you can now access individual fields for each of the lines, e.g.:
print(logs[0]["Modulation and Coding Scheme"])  # prints: 5
print(logs[4]["Resource Block Assignment"])  # prints: 0x00000032

使用regex(类似于(field1|field2|etc)\s*=(.*)(?!field1|field2|etc)并捕获两个组以获取字段、值元组)可以达到类似的效果,但我不喜欢构造超长regex模式,而且regex引擎也不是为此类任务设计的。在

我认为,处理如此混乱的文件的唯一方法是regexp。在

import re 


def dicts_generator():
"""Generates dicts with data from YOUR_FILE"""

    # Defining search regexp.
    #
    # Note, that regexp here in VERBOSE MODE. It means spaces are ignored
    # and comments are alowed. Because of it I had to escape real spaces.
    # In ends i've use \s in order to make more visilble to you.
    #
    # In regexp each line is a parameter. I did not understand what exactly
    # is Rnti, so i could be wrong in exact definitions of it.


    r=re.compile(r"""

        # (?P<format> .... )  - group named 'format'
        # it will be a dict key

        Format\ =\ (?P<format>.+?)\s+   

        Rnti\ =\ (?P<rnti>.+?)\s+

        Differentiation\ Flag\ =\ (?P<differentiation_flag>.+?)\s+

        # add other parameters here

        """, re.VERBOSE)

    # Read line after line and make search in it.
    # Actually, it is OK to search whole file at once, 
    # but forme this way is more clear.        
    for line in open(YOUR_FILE, mode="tr"):
        for m in r.finditer(line):
            yield m.groupdict()


for d in dicts_generator():
    print(d)   # do whatever you want with dict 'd'.

它打印:

^{pr2}$

相关问题 更多 >