根据条件将字典项拆分为更小的字典

2024-06-16 12:00:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个列表:一个包含部分事务,另一个包含其父事务:

partials = [1,2,3,4,5,6,7,8,9,10]
parents = ['a','b','c','d','a','d','f','c','c','a']

我把这些单子编入字典:

transactions = zip(partials, parents)

如您所见,某些部分事务具有相同的父事务。你知道吗

我需要把字典里的条目分成更小的组(更小的字典?)在每个组中,属于一个父级的事务不超过一个。例如,与父级“a”的所有事务都需要在不同的组中结束。你知道吗

我也需要尽可能少的组,因为在现实世界中,每个组将是一个文件手动上传。你知道吗

预期输出如下:

第1组将包含事务处理1a、2b、3c、4d、7f

第2组将包含交易5a、6d、8c

第3组将包含交易记录9c、10a

我已经为这件事绞尽脑汁好一阵子了,如果有任何建议我都会很感激的。到目前为止,我没有任何工作代码张贴。你知道吗


Tags: 文件列表字典世界条目交易zip事务
3条回答

有一种方法:

def bin_unique(partials, parents):
    bins = []
    for (ptx,par) in zip(partials, parents):
        pair_assigned = False
        # Try to find an existing bin that doesn't contain the parent.
        for bin_contents in bins:
            if par not in bin_contents:
                bin_contents[par] = (ptx, par)
                pair_assigned = True
                break
        # If we haven't been able to assign the pair, create a new bin
        #   (with the pair as it's first entry)
        if not pair_assigned:
            bins.append({par: (ptx, par)})

    return bins

用法

partials = [1,2,3,4,5,6,7,8,9,10]
parents = ['a','b','c','d','a','d','f','c','c','a']
binned = bin_unique(partials, parents)

输出

# Print the list of all dicts
print(binned)
# [
#   {'a': (1, 'a'), 'b': (2, 'b'), 'c': (3, 'c'), 'd': (4, 'd'), 'f': (7, 'f')}, 
#   {'a': (5, 'a'), 'd': (6, 'd'), 'c': (8, 'c')}, 
#   {'c': (9, 'c'), 'a': (10, 'a')}
# ]

# You can access the bins via index
print(binned[0])            # {'a': (1, 'a'), 'b': (2, 'b'), 'c': (3, 'c'), 'd': (4, 'd'), 'f': (7, 'f')}
print(len(binned))          # 3

# Each bin is a dictionary, keyed by parent, but the values are the (partial, parent) pair
print(binned[0].keys())     # dict_keys(['a', 'b', 'c', 'd', 'f'])
print(binned[0].values())   # dict_values([(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (7, 'f')])

# To show that all the transactions exist
all_pairs = [pair for b in binned for pair in b.values()]
print(sorted(all_pairs) == sorted(zip(partials, parents)))  # True

&barciewicz根据您提供的输入和预期的输出,我也尝试用我的方式解决这个问题。你知道吗

Note» I have used OrderedDict() from collections module to retain the order of keys in dictionary. I've also used json module to pretty print the dictionaries, list etc.

我展示了3种不同的方法来获得3个独立函数的结果,如下所示。你知道吗

{ "group1": "1a,2b,3c,4d,7f", "group2": "5a,8c,6d", "group3": "10a,9c" }

你知道吗

{ "group1": { "a": 1, "b": 2, "c": 3, "d": 4, "f": 7 }, "group2": { "a": 5, "c": 8, "d": 6 }, "group3": { "a": 10, "c": 9 } }

你知道吗

{ "a": [ 1, 5, 10 ], "b": [ 2 ], "c": [ 3, 8, 9 ], "d": [ 4, 6 ], "f": [ 7 ] }

你知道吗

»在http://rextester.com/OYFF74927上在线尝试下面的代码。你知道吗

from collections import OrderedDict
import json; 

def get_formatted_transactions(parents, partials):
    d = OrderedDict();

    for index, partial in enumerate(partials):
        if parents[index] in d:
            l = d[parents[index]]
            l.append(partial)
            d[parents[index]] = l;
        else:
            d[parents[index]] = [partial]

    return d;


def get_groups(transactions):
    i = 1;
    groups = OrderedDict();

    while transactions:
        group_name = "group" + str(i);
        groups[group_name] = {};
        keys = list(transactions.keys());

        for key in keys:
            if transactions[key]:
                groups[group_name][key] = transactions[key].pop(0);
                if not transactions[key]:
                    del transactions[key];
            else:
                del transactions[key]
        i += 1;

    return groups;

def get_comma_separated_data(groups):
    new_dict = OrderedDict();
    for group_name in groups:
        d = groups[group_name]
        new_dict[group_name] = ",".join([str(value) + key  for value, key in zip(d.values(), d.keys())])

    return new_dict;



# Starting point
if __name__ == "__main__":
    partials = [1,2,3,4,5,6,7,8,9,10];
    parents = ['a','b','c','d','a','d','f','c','c','a'];

    transactions = get_formatted_transactions(parents, partials);
    # Pretty pritining ordered dictionary
    print(json.dumps(transactions, indent=4));

    print("\n");

    # Creating groups to organize transactions
    groups = get_groups(transactions)
    # Pretty printing
    print(json.dumps(groups, indent=4))

    print("\n");

    # Get comma separated form 
    comma_separated_data = get_comma_separated_data(groups);
    # Pretty printing
    print(json.dumps(comma_separated_data, indent=4));
输出»
{
    "a": [
        1,
        5,
        10
    ],
    "b": [
        2
    ],
    "c": [
        3,
        8,
        9
    ],
    "d": [
        4,
        6
    ],
    "f": [
        7
    ]
}

{
    "group1": {
        "a": 1,
        "b": 2,
        "c": 3,
        "d": 4,
        "f": 7
    },
    "group2": {
        "a": 5,
        "c": 8,
        "d": 6
    },
    "group3": {
        "a": 10,
        "c": 9
    }
}

{
    "group1": "1a,2b,3c,4d,7f",
    "group2": "5a,8c,6d",
    "group3": "10a,9c"
}

一种方法就是记录你见过某个父母多少次。第一次看到父级“a”时,将该部分/父级对添加到第一组;第二组添加到第二组,以此类推

例如:

def split_into_groups(transactions):
    counts = {}
    out_groups = {}
    for partial, parent in transactions:
        counts[parent] = target = counts.get(parent, 0) + 1
        out_groups.setdefault(target, {})[partial] = parent
    return out_groups

给了我

In [9]: split_into_groups(zip(partials, parents))
Out[9]: 
{1: {1: 'a', 2: 'b', 3: 'c', 4: 'd', 7: 'f'},
 2: {5: 'a', 6: 'd', 8: 'c'},
 3: {9: 'c', 10: 'a'}}

如果计数还没有出现,则使用counts.get获取默认值0;如果还没有看到目标计数,则使用out_groups.setdefault生成默认的空字典并将其放入out组。你知道吗

如果必须处理重复的partials,可以将setdefault行替换为

out_groups.setdefault(target, []).append((partial, parent))

将组成员转换为元组列表而不是字典:

{1: [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (7, 'f')],
 2: [(5, 'a'), (6, 'd'), (8, 'c')],
 3: [(9, 'c'), (10, 'a')]}

相关问题 更多 >