制作匹配值词典

2024-04-25 07:45:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我在根据列表中的多个匹配项制作词典时遇到了一些问题。你知道吗

以下是示例列表:

items = [["1.pdf", "123", "train", "plaza"],
         ["2.pdf","123", "plane", "town"],
         ["3.pdf", "456", "train", "plaza"],
         ["4.pdf", "123", "plane", "city"],
         ["5.pdf", "123", "train", "plaza"],
         ["6.pdf","123", "plane", "town"]]

我想做的是把每一个单子的最后三个条目匹配起来,然后编一本字典。你知道吗

所以根据上面的列表,我假设期望的输出是。你知道吗

{1 : [["1.pdf", "123", "train", "plaza"],
      ["5.pdf", "123", "train", "plaza"]],
 2 : [["2.pdf","123", "plane", "town"],
      ["6.pdf","123", "plane", "town"]]
 3 : [["3.pdf", "456", "train", "plaza"]]
 4 : [["4.pdf", "123", "plane", "city"]]}

Tags: 示例city列表字典pdf条目itemstrain
3条回答

您可以使用collections.defaultdict

>>> from collections import defaultdict
>>> dic = defaultdict(list)
for item in items:
    dic[tuple(item[1:])].append(item)
...     
>>> ans = { i: item for i, item in enumerate(dic.values(), 1)}
>>> pprint(ans)
{1: [['1.pdf', '123', 'train', 'plaza'], ['5.pdf', '123', 'train', 'plaza']],
 2: [['2.pdf', '123', 'plane', 'town'], ['6.pdf', '123', 'plane', 'town']],
 3: [['4.pdf', '123', 'plane', 'city']],
 4: [['3.pdf', '456', 'train', 'plaza']]}

如果顺序很重要,那么使用collections.OrderedDict

>>> from collections import OrderedDict
>>> dic = OrderedDict()
for item in items:                                        
    dic.setdefault(tuple(item[1:]), []).append(item)
...     
>>> ans = { i: item for i, item in enumerate(dic.values(), 1)}
>>> pprint(ans)
{1: [['1.pdf', '123', 'train', 'plaza'], ['5.pdf', '123', 'train', 'plaza']],
 2: [['2.pdf', '123', 'plane', 'town'], ['6.pdf', '123', 'plane', 'town']],
 3: [['3.pdf', '456', 'train', 'plaza']],
 4: [['4.pdf', '123', 'plane', 'city']]}

你要找的是groupby操作。如果您使用的是pandas

In [2]: items
Out[2]: 
[['1.pdf', '123', 'train', 'plaza'],
 ['2.pdf', '123', 'plane', 'town'],
 ['3.pdf', '456', 'train', 'plaza'],
 ['4.pdf', '123', 'plane', 'city'],
 ['5.pdf', '123', 'train', 'plaza'],
 ['6.pdf', '123', 'plane', 'town']]

In [3]: df = pd.DataFrame.from_records(items)

In [4]: df
Out[4]: 
       0    1      2      3
0  1.pdf  123  train  plaza
1  2.pdf  123  plane   town
2  3.pdf  456  train  plaza
3  4.pdf  123  plane   city
4  5.pdf  123  train  plaza
5  6.pdf  123  plane   town


In [5]: for n, g in df.groupby([1, 2, 3]):
    print "name", n
    print g
   ....:     
name ('123', 'plane', 'city')
       0    1      2     3
3  4.pdf  123  plane  city
name ('123', 'plane', 'town')
       0    1      2     3
1  2.pdf  123  plane  town
5  6.pdf  123  plane  town
name ('123', 'train', 'plaza')
       0    1      2      3
0  1.pdf  123  train  plaza
4  5.pdf  123  train  plaza
name ('456', 'train', 'plaza')
       0    1      2      3
2  3.pdf  456  train  plaza

我可以建议一种不同的输出数据格式吗?你知道吗

from collections import *
d = defaultdict(list)

for item in items:
    d[tuple(item[1:])].append(item[0])

这就产生了一个类似于:

{
    ('123', 'train', 'plaza'): ['1.pdf', '5.pdf'],
    ('123', 'plane', 'town'):  ['2.pdf', '6.pdf'],
    ('123', 'plane', 'city'):  ['4.pdf'],
    ('456', 'train', 'plaza'): ['3.pdf']
}

相关问题 更多 >

    热门问题