按关键字列表筛选/分组词典列表

2条回答

网友

1楼 · 编辑于 2024-05-14 22:05:28

这很简单；首先，您需要某种方法轻松序列化dict中的相关数据。我将使用这种（非常简单）方法，但根据数据的复杂性，您可能需要想出更可靠的方法：

def serialize(d, keys):
    return ','.join([d[key] for key in keys])

然后，您只需将所有这些序列化值存储在一个列表中。列表中值的索引是组的ID。在

^{pr2}$

网友

2楼 · 编辑于 2024-05-14 22:05:28

我相信这是可行的，它是用Python3编写的，我还没有对它进行优化，但是如果它不够快的话，它可能是一个很好的起点。在

list_of_dicts = [
{'id':'id1', 'key1':'value_x', 'key2': 'value_y', 'key3':'value_z'},
{'id':'id3', 'key2' :'value_u', 'key3': 'value_v'},
{'id':'id2', 'key1':'value_x', 'key3':'value_z', 'key4': 'value_t'},
{'id':'id4', 'key1':'value_w', 'key2':'value_s', 'key3':'value_v'}
]

# Since we can't have objects as keys, make the values we're looking for into a string, and use that as the key.
def make_value_key(d, list_of_keys):
    res = ""
    for k in list_of_keys:
        res += str(d[k]) 
    return res

def group_dictionary(list_of_dicts, list_of_keys):
    group_vals = {}
    current_max_group = 0
    dicts_to_remove = []
    for i,d in enumerate(list_of_dicts):
        # If dict doesn't have all keys mark for removal.
        if not all(k in d for k in list_of_keys):
            dicts_to_remove.append(i)
        else:
            value_key = make_value_key(d, list_of_keys)
            # If value key exists assign group otherwise make new group.
            if value_key in group_vals:
                d['group'] = group_vals[value_key]
            else:
                group_vals[value_key] = current_max_group
                d['group'] = current_max_group
                current_max_group += 1

    list_of_dicts = [i for j, i in enumerate(list_of_dicts) if j not in dicts_to_remove]
    return list_of_dicts

list_of_keys=['key1','key3']

print(group_dictionary(list_of_dicts, list_of_keys))
print()
list_of_keys=['key3']

print(group_dictionary(list_of_dicts, list_of_keys))

输出：

^{pr2}$

优化1:

我们不必迭代所有键来检查它们是否存在，而是在生成value key时失败并返回一个空字符串，这将标记dict以删除：

def make_value_key(d, list_of_keys):
    res = ""
    for k in list_of_keys:
        if not k in d:
            return ""
        res += str(d[k]) 
    return res

def group_dictionary(list_of_dicts, list_of_keys):
    group_vals = {}
    current_max_group = 0
    dicts_to_remove = []
    for i,d in enumerate(list_of_dicts):
        value_key = make_value_key(d, list_of_keys)
        if value_key == "":
            dicts_to_remove.append(i)
            continue
        if value_key in group_vals:
            d['group'] = group_vals[value_key]

        else:
            group_vals[value_key] = current_max_group
            d['group'] = current_max_group
            current_max_group += 1

    list_of_dicts = [i for j, i in enumerate(list_of_dicts) if j not in dicts_to_remove]
    return list_of_dicts

组必须大于1:

这将使用第二个dict来跟踪组大小，然后检查组是否小于2，以标记要删除的组。在

def make_value_key(d, list_of_keys):
    res = ""
    for k in list_of_keys:
        if not k in d:
            return ""
        res += str(d[k]) 
    return res

def group_dictionary(list_of_dicts, list_of_keys):
    group_vals = {}
    group_count = {}
    current_max_group = 0
    indices_to_remove = []
    for i,d in enumerate(list_of_dicts):
        value_key = make_value_key(d, list_of_keys)
        if value_key == "":
            indices_to_remove.append(i)
            continue
        if value_key in group_vals:
            d['group'] = group_vals[value_key]
            # Second group member seen, remove from count dict. 
            group_count.pop(d['group'], None)
        else:
            group_vals[value_key] = current_max_group
            d['group'] = current_max_group
            # First time seen, add to count dict.
            group_count[current_max_group] = i
            current_max_group += 1

    indices_to_remove.extend(group_count.values())
    return [i for j, i in enumerate(list_of_dicts) if j not in indices_to_remove]

输出：

[{'key2': 'value_y', 'group': 0, 'id': 'id1', 'key1': 'value_x', 'key3': 'value_z'}, 
{'key4': 'value_t', 'group': 0, 'id': 'id2', 'key1': 'value_x', 'key3': 'value_z'}]

[{'key2': 'value_y', 'group': 0, 'id': 'id1', 'key1': 'value_x', 'key3': 'value_z'}, {'group': 1, 'id': 'id3', 'key2': 'value_u', 'key3': 'value_v'}, {'key4': 'value_t', 'group': 0, 'id': 'id2', 'key1': 'value_x', 'key3': 'value_z'}, {'key2': 'value_s', 'group': 1, 'id': 'id4', 'key1': 'value_w', 'key3': 'value_v'}]

优化2:

您可以从O(n^2)（循环一次dict列表进行计算并删除一次）到O(n*m log m)（循环一次dict列表并遍历已排序的已删除索引）：

def make_value_key(d, list_of_keys):
    res = ""
    for k in list_of_keys:
        if not k in d:
            return ""
        res += str(d[k]) 
    return res

def group_dictionary(list_of_dicts, list_of_keys):
    group_vals = {}
    group_count = {}
    current_max_group = 0
    indices_to_remove = []
    for i,d in enumerate(list_of_dicts):
        value_key = make_value_key(d, list_of_keys)
        if value_key == "":
            indices_to_remove.append(i)
            continue
        if value_key in group_vals:
            d['group'] = group_vals[value_key]
            # Second group member seen, remove from count dict. 
            group_count.pop(d['group'], None)
        else:
            group_vals[value_key] = current_max_group
            d['group'] = current_max_group
            # First time seen, add to count dict.
            group_count[current_max_group] = i
            current_max_group += 1

    indices_to_remove.extend(group_count.values())
    for index in sorted(indices_to_remove, reverse=True):
        del list_of_dicts[index]

    return list_of_dicts

相关问题更多 >

编程相关推荐

热门问题

热门文章

按关键字列表筛选/分组词典列表

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >