Python 列表过滤:从列表中移除子集
使用Python,如何通过有序子集匹配来减少一个列表的列表 [[..],[..],..]
?
在这个问题中,如果列表 M
包含了列表 L
的所有元素,并且顺序也一样,那么我们就说 L
是 M
的一个子集。举个例子,列表 [1,2] 是列表 [1,2,3] 的一个子集,但不是列表 [2,1,3] 的子集。
示例输入:
a. [[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3], [2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]
b. [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [1], [1, 2, 3, 4], [1, 2], [17, 18, 19, 22, 41, 48], [2, 3], [1, 2, 3], [50, 69], [1, 2, 3], [2, 3, 21], [1, 2, 3], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
期望结果:
a. [[1, 2, 4, 8], [2, 3, 21], [1, 2, 3, 4, 5, 6, 7]]
b. [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [17, 18, 19, 22, 41, 48], [50, 69], [2, 3, 21], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
更多示例:
L = [[1, 2, 3, 4, 5, 6, 7], [1, 2, 5, 6]]
- 不需要减少
L = [[1, 2, 3, 4, 5, 6, 7],
, [1, 2, 3]
[1, 2, 4, 8]]
- 需要减少
L = [[1, 2, 3, 4, 5, 6, 7], [7, 6, 5, 4, 3, 2, 1]]
- 不需要减少
(抱歉给大家带来了困惑,数据集不正确。)
10 个回答
7
这段代码应该在内存使用上非常高效。除了存储你最开始的列表列表之外,这段代码几乎不需要额外的内存(没有创建临时的集合或列表副本)。
def is_subset(needle,haystack):
""" Check if needle is ordered subset of haystack in O(n) """
if len(haystack) < len(needle): return False
index = 0
for element in needle:
try:
index = haystack.index(element, index) + 1
except ValueError:
return False
else:
return True
def filter_subsets(lists):
""" Given list of lists, return new list of lists without subsets """
for needle in lists:
if not any(is_subset(needle, haystack) for haystack in lists
if needle is not haystack):
yield needle
my_lists = [[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3],
[2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]
print list(filter_subsets(my_lists))
>>> [[1, 2, 4, 8], [2, 3, 21], [1, 2, 3, 4, 5, 6, 7]]
而且,为了好玩,这里有一个一行代码的写法:
def filter_list(L):
return [x for x in L if not any(set(x)<=set(y) for y in L if x is not y)]
11
这可以简化一下,不过:
l = [[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3], [2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]
l2 = l[:]
for m in l:
for n in l:
if set(m).issubset(set(n)) and m != n:
l2.remove(m)
break
print l2
[[1, 2, 4, 8], [2, 3, 21], [1, 2, 3, 4, 5, 6, 7]]
0
感谢所有给我建议的人,特别是在处理我有时不太准确的数据集时。根据@hughdbrown的解决方案,我做了一些修改,让它更符合我的需求:
我的修改是使用一个滑动窗口来检查目标,以确保能找到子序列。我觉得我应该用一个更合适的词来描述我的问题,而不是用“集合”。
def is_sublist_of_any_list(cand, lists):
# Compare candidate to a single list
def is_sublist_of_list(cand, target):
try:
i = 0
try:
start = target.index(cand[0])
except:
return False
while start < (len(target) + len(cand)) - start:
if cand == target[start:len(cand)]:
return True
else:
start = target.index(cand[0], start + 1)
except ValueError:
return False
# See if candidate matches any other list
return any(is_sublist_of_list(cand, target) for target in lists if len(cand) <= len(target))
# Compare candidates to all other lists
def super_lists(lists):
a = [cand for i, cand in enumerate(lists) if not is_sublist_of_any_list(cand, lists[:i] + lists[i+1:])]
return a
lists = [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [1], [1, 2, 3, 4], [1, 2], [17, 18, 19, 22, 41, 48], [2, 3], [1, 2, 3], [50, 69], [1, 2, 3], [2, 3, 21], [1, 2, 3], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
expect = [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [17, 18, 19, 22, 41, 48], [50, 69], [2, 3, 21], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
def test():
out = super_lists(list(lists))
print "In : ", lists
print "Out : ", out
assert (out == expect)
结果:
In : [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [1], [1, 2, 3, 4], [1, 2], [17, 18, 19, 22, 41, 48], [2, 3], [1, 2, 3], [50, 69], [1, 2, 3], [2, 3, 21], [1, 2, 3], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
Out : [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [17, 18, 19, 22, 41, 48], [50, 69], [2, 3, 21], [1, 2, 4, 8], [1, 2, 4, 5, 6]]