如何在python中有效地对字典中的值进行分组

3条回答

网友

1楼 · 编辑于 2024-06-06 22:28:01

这应该比在当前代码中比较每个元素要快。你知道吗

mydata = {'data mining': ['data', 'text mining', 'artificial intelligence'], 'neural networks': ['cnn', 'rnn', 'artificial intelligence'], 'data': [ 'text mining', 'artificial intelligence','data']}

compared_values = set()
referencekeys = {}
myresults = {}

comparator = lambda x : ''.join(sorted(x))

for key, value in mydata.items():
    compvalue = comparator(value)
    if not set([compvalue]).issubset(compared_values):
        compared_values.update([compvalue])
        referencekeys[compvalue] = key
        myresults[key] = value
    else:
        if len(key) > len(referencekeys[compvalue]):
            myresults[key] = myresults.pop(referencekeys[compvalue])
            referencekeys[compvalue] = key

print(myresults)

在这里，我定义了一个比较器，对列表值中的字符串进行排序并连接它们。不确定它是否比你使用计数器的效率更高。你知道吗

我在字典上循环一次，并将比较器生成的字符串存储在set()中。循环的每次迭代我都检查新的比较器字符串是否在集合中。如果没有，我将其添加到集合中以供将来参考，并将键值对添加到最终的结果字典中。否则，我检查键的长度，如果新键更长，则更改dict的键，如here所示。我还需要另一个字典，在其中切换key-compvalue（compvalue是键，key是值），以便跟踪哪个是每个比较值的键。你知道吗

应该更快（我没有检查时间），因为我有一个单一的循环。第二个循环的等价物是set([compvalue]).issubset(compared_values)，set对于这类作业比for循环更有效。你知道吗

试试看是否有用。你知道吗

编辑

另一个不使用set的类似想法突然出现在我的脑海中。你知道吗

referencekeys = {}
myresults = {}

comparator = lambda x : ''.join(sorted(x))

for key, value in mydata.items():
    compvalue = comparator(value)
    try:
        if len(key) > len(referencekeys[compvalue]):
            myresults[key] = myresults.pop(referencekeys[compvalue])
            referencekeys[compvalue] = key
    except KeyError:
        referencekeys[compvalue] = key
        myresults[key] = value

print(myresults)

在这里，我只是尝试一下if语句。如果referencekeys[compvalue]抛出一个KeyError，则表示代码尚未找到类似的值。否则，请检查密钥长度。你知道吗

同样，我没有检查执行时间，所以我不确定哪个更有效。但结果是一样的。你知道吗

编辑2

在注释请求之后，保持空列表的原样就足以将循环体包装在if语句中（这里我使用第一段代码，但是第二段代码可以实现相同的思想）。你知道吗

for key, value in mydata.items():
    if len(value) > 0:
        compvalue = comparator(value)
        if not set([compvalue]).issubset(compared_values):
            compared_values.update([compvalue])
            referencekeys[compvalue] = key
            myresults[key] = value
        else:
            if len(key) > len(referencekeys[compvalue]):
                myresults[key] = myresults.pop(referencekeys[compvalue])
                referencekeys[compvalue] = key
    else:
        myresults[key] = value

如果len(value)==0，则无需将密钥存储在referencekeys。如果原始数据mydata是单个字典，则键是唯一的。所以保证你不会覆盖任何内容。你知道吗

例如，如果您有mydata = {'data mining': ['data', 'text mining', 'artificial intelligence'], 'neural networks': ['cnn', 'rnn', 'artificial intelligence'], 'data': [ 'text mining', 'artificial intelligence','data'], 'data bis':[], 'neural link':[]}，您将得到：myresults = {'data mining': ['data', 'text mining', 'artificial intelligence'], 'neural networks': ['cnn', 'rnn', 'artificial intelligence'], 'data bis': [], 'neural link': []}

网友
2楼 · 编辑于 2024-06-06 22:28:01

您可以首先按长度对字典排序，这样就可以保证先出现较长的键。你知道吗
from itertools import groupby d = { "data mining": ["data", "text mining", "artificial intelligence"], "neural networks": ["cnn", "rnn", "artificial intelligence"], "data": ["text mining", "artificial intelligence", "data"], } result = dict( g for k, (g, *_) in groupby( sorted(d.items(), key=lambda x: len(x[0]), reverse=True), key=lambda x: sorted(x[1]), ) )
它也只有一行，这总是好的！：）
打印result产生：
{'neural networks': ['cnn', 'rnn', 'artificial intelligence'], 'data mining': ['data', 'text mining', 'artificial intelligence']}

网友
3楼 · 编辑于 2024-06-06 22:28:01

Python内置类型来解救！你知道吗

tmp = dict()
for topic, words in data.items():
    ww = frozenset(words)
    tmp[ww] = max(tmp.get(ww, topic), topic, key=len)
result = {topic: list(ww) for ww, topic in tmp.items()}

相关问题更多 >

编程相关推荐

热门问题

热门文章