为什么Python的itertools.permutations会出现重复？（当原始列表有重复时）

54 投票

6 回答

14893 浏览

提问于 2025-04-16 20:35

大家都知道，如果有n个不同的符号，那么这些符号可以排列成n!种不同的方式。但是，当符号不再是不同的，数学上和其他地方通常的做法是只计算不同的排列。因此，对于列表[1, 1, 2]，通常认为它的排列是
[1, 1, 2], [1, 2, 1], [2, 1, 1]。实际上，下面的C++代码正好打印出这三种排列：

int a[] = {1, 1, 2};
do {
    cout<<a[0]<<" "<<a[1]<<" "<<a[2]<<endl;
} while(next_permutation(a,a+3));

另一方面，Python的itertools.permutations似乎打印出的是其他的内容：

import itertools
for a in itertools.permutations([1, 1, 2]):
    print a

这段代码打印的是

(1, 1, 2)
(1, 2, 1)
(1, 1, 2)
(1, 2, 1)
(2, 1, 1)
(2, 1, 1)

正如用户Artsiom Rudzenka在回答中指出的，Python的文档是这样说的：

元素是根据它们的位置被视为唯一的，而不是根据它们的值。

我想问的是：为什么会做出这样的设计决定？

看起来遵循通常的做法会得到更有用的结果（而且这通常正是我想要的）……或者说Python的这种行为有什么我没想到的应用吗？

[或者这是某种实现问题吗？比如next_permutation算法——在StackOverflow上有解释这里（由我提供），并且这里显示是O(1)的摊销复杂度——看起来在Python中是高效且可实现的，但Python是否在做一些更高效的事情，因为它不保证基于值的字典序？如果是这样，效率的提升是否被认为是值得的？]

数据结构 itertools 复杂度分析算法设计程序优化组合数学重复元素排列

6 个回答

其实，想要实现你喜欢的功能，包裹一下 itertools.permutations 是个简单的方法，这可能也影响了大家的选择。根据文档的说明，itertools 是一组工具，帮助你自己创建迭代器。

def unique(iterable):
    seen = set()
    for x in iterable:
        if x in seen:
            continue
        seen.add(x)
        yield x

for a in unique(permutations([1, 1, 2])):
    print a

(1, 1, 2)
(1, 2, 1)
(2, 1, 1)

不过，正如评论中提到的，这样做的效率可能没有你想象的那么高：

>>> %timeit iterate(permutations([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2]))
1 loops, best of 3: 4.27 s per loop

>>> %timeit iterate(unique(permutations([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2])))
1 loops, best of 3: 13.2 s per loop

如果有足够的人对此感兴趣，或许可以在 itertools 中新增一个函数，或者给 itertools.permutations 加个可选参数，这样就能更高效地生成不重复的排列。

回答于 2025-04-16 由 Python大师

分享举报

我觉得Gareth Rees的回答是最吸引人的解释（比起Python库设计者的回答），也就是Python的itertools.permutations并不会比较元素的值。想想看，这正是问题所问的，但我现在明白了，这可能是一个优点，具体取决于人们通常用itertools.permutations做什么。

为了完整起见，我比较了三种生成所有不同排列的方法。方法1是用Python的itertools.permutations，虽然在内存和时间上效率很低，但代码量最少，正如zeekay的回答所示。方法2是基于C++的next_permutation的生成器版本，来自这篇博客。方法3是我写的，跟C++的next_permutation算法更接近；它是在原地修改列表的（我没有让它太通用）。

def next_permutationS(l):
    n = len(l)
    #Step 1: Find tail
    last = n-1 #tail is from `last` to end
    while last>0:
        if l[last-1] < l[last]: break
        last -= 1
    #Step 2: Increase the number just before tail
    if last>0:
        small = l[last-1]
        big = n-1
        while l[big] <= small: big -= 1
        l[last-1], l[big] = l[big], small
    #Step 3: Reverse tail
    i = last
    j = n-1
    while i < j:
        l[i], l[j] = l[j], l[i]
        i += 1
        j -= 1
    return last>0

这里有一些结果。我现在对Python的内置函数更加尊重了：当元素都是（或几乎都是）不同的时候，它的速度大约是其他方法的三到四倍。当然，当有很多重复元素时，使用它就不是个好主意。

Some results ("us" means microseconds):

l                                       m_itertoolsp  m_nextperm_b  m_nextperm_s
[1, 1, 2]                               5.98 us       12.3 us       7.54 us
[1, 2, 3, 4, 5, 6]                      0.63 ms       2.69 ms       1.77 ms
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]         6.93 s        13.68 s       8.75 s

[1, 2, 3, 4, 6, 6, 6]                   3.12 ms       3.34 ms       2.19 ms
[1, 2, 2, 2, 2, 3, 3, 3, 3, 3]          2400 ms       5.87 ms       3.63 ms
[1, 1, 1, 1, 1, 1, 1, 1, 1, 2]          2320000 us    89.9 us       51.5 us
[1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4]    429000 ms     361 ms        228 ms

如果有人想深入了解，代码在这里。

回答于 2025-04-16 由 Python大师

分享举报

我不能代表设计 itertools.permutations 的人（Raymond Hettinger）说话，但我觉得这个设计有几个优点：

首先，如果你使用 next_permutation 这种方法，你就只能传入那些可以进行线性排序的对象。而 itertools.permutations 可以处理任何类型的对象。想象一下，这会有多麻烦：

>>> list(itertools.permutations([1+2j, 1-2j, 2+j, 2-j]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: no ordering relation is defined for complex numbers

其次，itertools.permutations 不会对对象进行相等性测试，这样就避免了在通常情况下不必要地调用 __eq__ 方法所带来的开销。

总的来说，itertools.permutations 可靠且高效地解决了常见的情况。确实有人认为 itertools 应该提供一个避免重复排列的函数，但这样的函数应该是 itertools.permutations 的补充，而不是替代。为什么不自己写这样一个函数并提交一个补丁呢？

回答于 2025-04-16 由 Python大师

分享举报

为什么Python的itertools.permutations会出现重复？（当原始列表有重复时）

6 个回答

撰写回答