Python 列表过滤：从列表中移除子集

14 投票

10 回答

19811 浏览

提问于 2025-04-15 13:49

使用Python，如何通过有序子集匹配来减少一个列表的列表 [[..],[..],..]？

在这个问题中，如果列表 M 包含了列表 L 的所有元素，并且顺序也一样，那么我们就说 L 是 M 的一个子集。举个例子，列表 [1,2] 是列表 [1,2,3] 的一个子集，但不是列表 [2,1,3] 的子集。

示例输入：

a. [[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3], [2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]
b. [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [1], [1, 2, 3, 4], [1, 2], [17, 18, 19, 22, 41, 48], [2, 3], [1, 2, 3], [50, 69], [1, 2, 3], [2, 3, 21], [1, 2, 3], [1, 2, 4, 8], [1, 2, 4, 5, 6]]

期望结果：

a. [[1, 2, 4, 8], [2, 3, 21], [1, 2, 3, 4, 5, 6, 7]]
b. [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [17, 18, 19, 22, 41, 48], [50, 69],  [2, 3, 21], [1, 2, 4, 8], [1, 2, 4, 5, 6]]

更多示例：

L = [[1, 2, 3, 4, 5, 6, 7], [1, 2, 5, 6]] - 不需要减少

L = [[1, 2, 3, 4, 5, 6, 7], ~~[1, 2, 3]~~, [1, 2, 4, 8]] - 需要减少

L = [[1, 2, 3, 4, 5, 6, 7], [7, 6, 5, 4, 3, 2, 1]] - 不需要减少

(抱歉给大家带来了困惑，数据集不正确。)

列表操作数据结构数据处理编程逻辑算法设计列表过滤有序子集子集匹配

10 个回答

这段代码应该在内存使用上非常高效。除了存储你最开始的列表列表之外，这段代码几乎不需要额外的内存（没有创建临时的集合或列表副本）。

def is_subset(needle,haystack):
   """ Check if needle is ordered subset of haystack in O(n)  """

   if len(haystack) < len(needle): return False

   index = 0
   for element in needle:
      try:
         index = haystack.index(element, index) + 1
      except ValueError:
         return False
   else:
      return True

def filter_subsets(lists):
   """ Given list of lists, return new list of lists without subsets  """

   for needle in lists:
      if not any(is_subset(needle, haystack) for haystack in lists
         if needle is not haystack):
         yield needle

my_lists = [[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3], 
            [2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]    
print list(filter_subsets(my_lists))

>>> [[1, 2, 4, 8], [2, 3, 21], [1, 2, 3, 4, 5, 6, 7]]

而且，为了好玩，这里有一个一行代码的写法：

def filter_list(L):
    return [x for x in L if not any(set(x)<=set(y) for y in L if x is not y)]

回答于 2025-04-15 由 Python大师

分享举报

这可以简化一下，不过：

l = [[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3], [2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]
l2 = l[:]

for m in l:
    for n in l:
        if set(m).issubset(set(n)) and m != n:
            l2.remove(m)
            break

print l2
[[1, 2, 4, 8], [2, 3, 21], [1, 2, 3, 4, 5, 6, 7]]

回答于 2025-04-15 由 Python大师

分享举报

感谢所有给我建议的人，特别是在处理我有时不太准确的数据集时。根据@hughdbrown的解决方案，我做了一些修改，让它更符合我的需求：

我的修改是使用一个滑动窗口来检查目标，以确保能找到子序列。我觉得我应该用一个更合适的词来描述我的问题，而不是用“集合”。

def is_sublist_of_any_list(cand, lists):
    # Compare candidate to a single list
    def is_sublist_of_list(cand, target):
        try:
            i = 0            
            try:
                start = target.index(cand[0])
            except:
                return False

            while start < (len(target) + len(cand)) - start:
                if cand == target[start:len(cand)]:
                    return True
                else:
                    start = target.index(cand[0], start + 1)
        except ValueError:
            return False

    # See if candidate matches any other list
    return any(is_sublist_of_list(cand, target) for target in lists if len(cand) <= len(target))

# Compare candidates to all other lists
def super_lists(lists):
    a = [cand for i, cand in enumerate(lists) if not is_sublist_of_any_list(cand, lists[:i] + lists[i+1:])]
    return a

lists = [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [1], [1, 2, 3, 4], [1, 2], [17, 18, 19, 22, 41, 48], [2, 3], [1, 2, 3], [50, 69], [1, 2, 3], [2, 3, 21], [1, 2, 3], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
expect = [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [17, 18, 19, 22, 41, 48], [50, 69],  [2, 3, 21], [1, 2, 4, 8], [1, 2, 4, 5, 6]]

def test():
    out = super_lists(list(lists))

    print "In  : ", lists
    print "Out : ", out

    assert (out == expect)

结果：

In  :  [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [1], [1, 2, 3, 4], [1, 2], [17, 18, 19, 22, 41, 48], [2, 3], [1, 2, 3], [50, 69], [1, 2, 3], [2, 3, 21], [1, 2, 3], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
Out :  [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [17, 18, 19, 22, 41, 48], [50, 69], [2, 3, 21], [1, 2, 4, 8], [1, 2, 4, 5, 6]]

回答于 2025-04-15 由 Python大师

分享举报

Python 列表过滤：从列表中移除子集

10 个回答

撰写回答