如何将带有数字的数据转换为包含列表的字典？

import re import itertools file_data = [re.findall('\d+', i.strip('\n')) for i in open('ground_truth')] print(file_data) final_data = [['{}-{}'.format(a, b), list(map(float, c))] for a, b, *c in file_data] new_data = {a: list(map(lambda x: x[-1], b)) for a, b in itertools.groupby(sorted(final_data, key=lambda x: x[0]), key=lambda x: x[0])}

1条回答

网友

1楼 · 发布于 2024-04-24 16:46:05

您需要使用更复杂的正则表达式来忽略十进制.0值：

re.findall(r'(?<!\.)\d+', i)

这将使用一个负数，忽略前面有.的任何数字。这将忽略.0，但是如果存在.01，那么.0（或.<digit>）之外的额外数字仍将被拾取。你的意见应该足够了。你知道吗

我会在这里使用一个常规的循环来提高代码的可读性，并保留代码O（N）而不是O（NlogN）（不需要排序）：

new_data = {}
with open('ground_truth') as f:
    for line in f:
        k1, k2, x1, y1, x2, y2 = map(int, re.findall(r'(?<!\.)\d+', line))
        key = '{}-{}'.format(k1, k2)
        new_data.setdefault(key, []).append([x1, y1, x2, y1, x2, y2, x1, y2])

我在这里硬编码了您的x, y组合，因为您似乎有一个非常具体的所需顺序。你知道吗

演示：

>>> import re
>>> file_data = '''\
... values/test/10/blueprint-0.png,2089.0,545.0,2100.0,546.0
... values/test/10/blueprint-0.png,2112.0,545.0,2136.0,554.0
... '''
>>> new_data = {}
>>> for line in file_data.splitlines(True):
...     k1, k2, x1, y1, x2, y2 = map(int, re.findall(r'(?<!\.)\d+', line))
...     key = '{}-{}'.format(k1, k2)
...     new_data.setdefault(key, []).append([x1, y1, x2, y1, x2, y2, x1, y2])
...
>>> new_data
{'10-0': [[2089, 545, 2100, 545, 2100, 546, 2089, 546], [2112, 545, 2136, 545, 2136, 554, 2112, 554]]}

一个很好的替代方法就是把你的输入文件当作CSV格式！使用csv模块是分割列的好方法，之后只需处理第一个filename列中的数字：

import csv, re

new_data = {}
with open('ground_truth') as f:
    reader = csv.reader(f)
    for filename, *numbers in reader:
        k1, k2 = re.findall(r'\d+', filename)  # no need to even convert to int
        key = '{}-{}'.format(k1, k2)
        x1, y1, x2, y2 = (int(float(n)) for n in numbers)
        new_data.setdefault(key, []).append([x1, y1, x2, y1, x2, y2, x1, y2])

相关问题更多 >

编程相关推荐

热门问题

热门文章