如何在python中解析一行并将其拆分以保存在字典中

2条回答

网友

1楼 · 编辑于 2024-06-01 05:10:20

在解析这个怪物时（在你找到它的作者之后），最好的办法是预先识别所有的字段名，然后捕捉这些字段之间的值。在

最简单的方法是找到行中的每个字段，然后提取它后面的等号和下一个找到的字段之间的值。比如：

# List all fields here (if possible in order of appearance)
# Everything not listed will end up as a part of another detected field's value
FIELD_LIST = ["Format", "Rnti", "Format0/Format1A Differentiation Flag",
              "Localised / Distributed VRB Assignment Flag", "Resource Block Assignment",
              "Resource Blocks Detail", "Modulation and Coding Scheme",
              "Harq Process Number", "New Data Indicator", "Redundancy Version",
              "TPC Command"]

# lets separate a logic to parse our ugly log in a function
def parse_ugly_log_line(log):
    field_indexes = {field: log.find(field) for field in FIELD_LIST}  # get field indexes
    field_order = sorted(field_indexes, key=field_indexes.get)  # sort indexes
    parsed_fields = {}  # store for our fields
    for i, field in enumerate(field_order):
        if field_indexes[field] == -1:  # field not found, skip
            continue
        field_start = log.find("=", field_indexes[field])  # value begins after `=`
        if field_start == -1:  # cannot find the field value, skip
            continue
        # field value ends where the next field begins:
        field_end = field_indexes[field_order[i + 1]] if i < len(field_order) - 1 else None
        if field_end and field_start > field_end:  # overlapping field value, skip
            continue
        parsed_fields[field] = log[field_start + 1:field_end].strip()  # extract the value
    return parsed_fields

# lets now open our log file and parse it line by line:
logs = []  # storage of the parsed data
with open("your_log.txt", "r") as f:
    for line in f:
        logs.append(parse_ugly_log_line(line))

# you can now access individual fields for each of the lines, e.g.:
print(logs[0]["Modulation and Coding Scheme"])  # prints: 5
print(logs[4]["Resource Block Assignment"])  # prints: 0x00000032

使用regex（类似于(field1|field2|etc)\s*=(.*)(?!field1|field2|etc)并捕获两个组以获取字段、值元组）可以达到类似的效果，但我不喜欢构造超长regex模式，而且regex引擎也不是为此类任务设计的。在

网友

2楼 · 编辑于 2024-06-01 05:10:20

我认为，处理如此混乱的文件的唯一方法是regexp。在

import re 


def dicts_generator():
"""Generates dicts with data from YOUR_FILE"""

    # Defining search regexp.
    #
    # Note, that regexp here in VERBOSE MODE. It means spaces are ignored
    # and comments are alowed. Because of it I had to escape real spaces.
    # In ends i've use \s in order to make more visilble to you.
    #
    # In regexp each line is a parameter. I did not understand what exactly
    # is Rnti, so i could be wrong in exact definitions of it.


    r=re.compile(r"""

        # (?P<format> .... )  - group named 'format'
        # it will be a dict key

        Format\ =\ (?P<format>.+?)\s+   

        Rnti\ =\ (?P<rnti>.+?)\s+

        Differentiation\ Flag\ =\ (?P<differentiation_flag>.+?)\s+

        # add other parameters here

        """, re.VERBOSE)

    # Read line after line and make search in it.
    # Actually, it is OK to search whole file at once, 
    # but forme this way is more clear.        
    for line in open(YOUR_FILE, mode="tr"):
        for m in r.finditer(line):
            yield m.groupdict()


for d in dicts_generator():
    print(d)   # do whatever you want with dict 'd'.

它打印：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在python中解析一行并将其拆分以保存在字典中

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >