计算列表中所有集合组合的交集数量
我有一组集合,我想找出每个集合组合中,只有在交集中出现的项目数量。简单来说,我想做的事情就像在维恩图中计算数字一样。
一个简单的例子可能会让这个问题更清楚。
a = set(1,2,5,10,12)
b = set(1,2,6,9,12,15)
c = set(1,2,7,8,15)
我最终应该得到的结果是,只有在以下情况下找到的项目数量:
- a
- b
- c
- a和b的交集
- a和c的交集
- b和c的交集
- a、b和c的交集
一种不太灵活的做法是
num_a = len(a - b - c) # len(set([5,10])) -> 2
num_b = len(b - a - c) # len(set([6,9])) -> 2
num_c = len(c - a - b) # len(set([7,8])) -> 2
num_ab = len((a & b) - c) # 1
num_ac = len((a & c) - b) # 0
num_bc = len((b & c) - a) # 1
num_abc = len(a & b & c) # 2
虽然这种方法适用于3个集合,但我的集合是动态变化的。
3 个回答
1
你可以使用 itertools.combinations
来获取所有可能的组合。http://docs.python.org/2/library/itertools.html
1
我建议你试试位掩码:
sets = [
set([1,2,5,10,12]),
set([1,2,6,9,12,15]),
set([1,2,7,8,15]),
]
d = {}
for n, s in enumerate(sets):
for i in s:
d[i] = d.get(i, 0) | (1 << n)
for mask in range(1, 2**len(sets)):
cnt = sum(1 for x in d.values() if x & mask == mask)
num = ','.join(str(j) for j in range(len(sets)) if mask & (1 << j))
print 'number of items in set(s) %s = %d' % (num, cnt)
你输入的结果是:
number of items in set(s) 0 = 5
number of items in set(s) 1 = 6
number of items in set(s) 0,1 = 3
number of items in set(s) 2 = 5
number of items in set(s) 0,2 = 2
number of items in set(s) 1,2 = 3
number of items in set(s) 0,1,2 = 2
3
如果我理解正确的话,像这样的代码应该可以正常运行:
from itertools import combinations
def venn_count(named_sets):
names = set(named_sets)
for i in range(1, len(named_sets)+1):
for to_intersect in combinations(sorted(named_sets), i):
others = names.difference(to_intersect)
intersected = set.intersection(*(named_sets[k] for k in to_intersect))
unioned = set.union(*(named_sets[k] for k in others)) if others else set()
yield to_intersect, others, len(intersected - unioned)
ns = {"a": {1,2,5,10,12}, "b": {1,2,6,9,12,15}, "c": {1,2,7,8,15}}
for intersected, unioned, count in venn_count(ns):
print 'len({}{}) = {}'.format(' & '.join(sorted(intersected)),
' - ' + ' - '.join(sorted(unioned)) if unioned else '',
count)
执行后会得到:
len(a - b - c) = 2
len(b - a - c) = 2
len(c - a - b) = 2
len(a & b - c) = 1
len(a & c - b) = 0
len(b & c - a) = 1
len(a & b & c) = 2