在python中计算两个字典中的冗余值的数量

output = open("Output.txt", "w") output.write('keydict1\tlength_value_dict1\tkeydict2\tlength_value_dict2\tNumber_of_overlap\n') for key, value in dict1.items(): len1=len(dict1[key]) #gives length of the key for vals in value: #to iterate over each of the values corresponding to key for key2, value2 in dict2.items(): #iterates over keys and values of second dictionary len2=len(dict2[key2]) counter = 0 #sets counter to 0 for vals2 in value2: if vals == vals2: #checks values if equal to each other counter = counter + 1 #if it is equal, it adds 1 to the counter, then it is supposed to reset it when it gets to next key2 newline= key,str(len1),key2,str(len2),str(counter) #For some reason, i cant output the file in the command below except if the integers are converted to strings. Not sure if there is a better trick output.write('\t'.join(newline)+"\n")

keydict1 length_value_dict1 keydict2 length_value_dict2 Number_of_overlap Ex2 4 lnc3 3 1 Ex2 4 lnc2 2 1 Ex2 4 lnc1 4 0 Ex2 4 lnc3 3 0 Ex2 4 lnc2 2 0 Ex2 4 lnc1 4 1 Ex2 4 lnc3 3 0 Ex2 4 lnc2 2 0 Ex2 4 lnc1 4 1 Ex2 4 lnc3 3 0 Ex2 4 lnc2 2 0 Ex2 4 lnc1 4 1 Ex1 3 lnc3 3 1 Ex1 3 lnc2 2 0 Ex1 3 lnc1 4 1 Ex1 3 lnc3 3 1 Ex1 3 lnc2 2 0 Ex1 3 lnc1 4 0 Ex1 3 lnc3 3 0 Ex1 3 lnc2 2 1 Ex1 3 lnc1 4 0

1条回答

网友

1楼 · 发布于 2024-04-26 23:17:11

你的算法应该是这样的。在

for k1, v1 in dict1.items():
    for k2, v2 in dict2.items():
        # now find the number of items that appear in both v1 and v2

但正如你现在所注意到的，你的算法做到了这一点。在

^{pr2}$

实际上，您可以找到v1中的项v出现在v2中的次数，应该是0或1。由于这个for v in v1循环，您可以多次检查键k1和k2之间的项目冗余。在

现在让我们回到原始算法。我们只想找到intersection中两个列表v1和{}之间的元素数目。因为交集是一个集合概念，我们只需做len(set(v1).intersection(v2))。下面是一个简单的代码片段，它可以实现所有这些，而不需要特殊的格式。在

dict1 = {'Ex1': ['Spata1', 'D', 'E'], 'Ex2': ['Fgg', 'Wfdc2', 'F', 'G']}
dict2 = {'lnc3': ['Spata1', 'Fgg', 'D'], 'lnc2': ['Fgg', 'E'], 'lnc1': ['Spata1', 'Wfdc2', 'F', 'G']}

for k1, v1 in dict1.items():
    for k2, v2 in dict2.items():
        print '%3s %5d %10s %5d %5d' % (k1, len(v1), k2, len(v2), len(set(v1).intersection(v2)))

请注意，字典没有您所期望的键顺序的概念。如果你真的想，有ways来补救这个问题。在

Ex2     4       lnc3     3     1
Ex2     4       lnc2     2     1
Ex2     4       lnc1     4     3
Ex1     3       lnc3     3     2
Ex1     3       lnc2     2     1
Ex1     3       lnc1     4     1

如果你的列表有重复的值，使用集合交集可能会扭曲你的计数，因为集合会忽略重复的元素。传统的方法是找到重叠，然后为每个元素创建一个字典，比如，v2，然后对于v1中的每个项目，查看它在v2中存在的次数，并求出总数。代码：

from collections import Counter

v2_counts = Counter(v2)
overlap = sum(v2_counts.get(v, 0) for v in v1)

方法get(key, default_value)尝试使用键key获取字典的值，如果不存在，它将返回default_value。在

相关问题更多 >

编程相关推荐

热门问题

热门文章