交集复杂度

20 投票

3 回答

18179 浏览

提问于 2025-04-17 06:09

在Python中，你可以通过以下方式获取两个集合的交集：

>>> s1 = {1, 2, 3, 4, 5, 6, 7, 8, 9}
>>> s2 = {0, 3, 5, 6, 10}
>>> s1 & s2
set([3, 5, 6])
>>> s1.intersection(s2)
set([3, 5, 6])

有人知道这个交集（&）算法的复杂度吗？

补充：另外，有人知道Python集合背后使用的是什么数据结构吗？

数据结构复杂度分析集合论交集算法

3 个回答

要找两个集合的交集，假设这两个集合的大小分别是 m 和 n，可以用 O(max{m,n} * log(min{m,n})) 的方法来实现。这里我们假设 m 比 n 小很多。

1. Represent the two sets as list/array(something sortable)
2. Sort the **smaller** list/array (cost: m*logm)
3. Do until all elements in the bigger list has been checked:
    3.1 Sort the next **m** items on the bigger list(cost: m*logm)
    3.2 With a single pass compare the smaller list and the m items you just sorted and take the ones that appear in both of them(cost: m)
4. Return the new set

在第三步的循环中，会运行 n/m 次，每次的时间复杂度是 O(m*logm)，所以当 m 远小于 n 时，总的时间复杂度就是 O(nlogm)。

我认为这是目前已知的最优下限。

回答于 2025-04-17 由 Python大师

分享举报

集合背后的数据结构是一个哈希表，它的查找和插入操作通常很快，平均时间复杂度是O(1)。

交集算法会循环执行min(len(s1), len(s2))次。每次循环都会进行一次查找，如果找到匹配的元素，就会进行插入。在纯Python中，它的实现大致是这样的：

    def intersection(self, other):
        if len(self) <= len(other):
            little, big = self, other
        else:
            little, big = other, self
        result = set()
        for elem in little:
            if elem in big:
                result.add(elem)
        return result

回答于 2025-04-17 由 Python大师

分享举报

这个问题的答案似乎只需要在搜索引擎上查一下就能找到。你也可以直接访问这个python.org上的时间复杂度页面。简单总结一下：

Average:     O(min(len(s), len(t))
Worst case:  O(len(s) * len(t))

补充说明：正如Raymond在下面提到的，“最坏情况”的发生可能性不大。我最开始提到这个是为了全面考虑，留下它是为了给下面的讨论提供背景，但我觉得Raymond说得对。

回答于 2025-04-17 由 Python大师

分享举报

交集复杂度

3 个回答

撰写回答