预排序改进itertools.组合表演？

2条回答

网友

1楼 · 编辑于 2024-04-19 18:57:22

我非常肯定，您所观察到的性能差异的重要原因是检查if (i < j)49995000次与排序10000个元素的列表相比，而不是假设的排序与不排序的iterable。在

combinations在这两种情况下所做的工作量应该是相同的，因为它们生成相同数量的元素，并且不对元素进行排序并按字典顺序返回它们。在

要正确测试排序是否有影响，请执行以下操作：

对同一组数据执行相同的条件检查，但已排序和未排序：

sorted_data = sorted(data)

def test1():
    g = ((i,j) if (i < j) else (j,i) for i, j in combinations(sorted_data, 2))
    return len([(i, j) for i, j in g])

def test2():
    g = ((i,j) if (i < j) else (j,i) for i, j in combinations(data, 2))
    return len([(i, j) for i, j in g])

%timeit test1()
1 loops, best of 3: 23.5 s per loop

%timeit test2()
1 loops, best of 3: 24.6 s per loop

在没有条件的情况下执行测试：

def test3():
    g = ((i,j) for i, j in combinations(sorted_data, 2))
    return len([(i, j) for i, j in g])

def test4():
    g = ((i,j) for i, j in combinations(data, 2))
    return len([(i, j) for i, j in g])

%timeit test3()
1 loops, best of 3: 20.7 s per loop

%timeit test4()
1 loops, best of 3: 21.3 s per loop

Why is the first method so quick, compared to the second ? Use of filter is the main way i use for filtering data. Until now, i assume that the filter form was heavily optimized.

使用组合会生成较少的元素，而这些元素是根据条件进行检查的。10000C2 = 49995000表示组合，而{}表示产品。在

Why the GO term impact more the first and the second method than the third one ?

第一种方法和第二种方法受附加字符的影响，比较次数为49995000次和100000000次。第三种情况只会受到排序10000个项目所需的比较的影响。在

经过一段时间的调整，排序似乎会有所不同，但不会像使用条件语句那样大。不知道是什么原因造成的。在

from itertools import combinations import random as rd data = ["{0:04d}".format(i) for i in range(0, 10000)] # Normalize str length rd.shuffle(data) sorted_data = sorted(data) reversed_sorted_data = sorted_data[::-1] def test1(): g = list((i,j) if (i < j) else (j,i) for i, j in combinations(data, 2)) print('unsorted with conditional: ', len(g)) %timeit test1() # unsorted with conditional: 49995000 # unsorted with conditional: 49995000 # unsorted with conditional: 49995000 # unsorted with conditional: 49995000 # 1 loops, best of 3: 20.7 s per loop def test2(): g = list((i,j) if (i < j) else (j,i) for i, j in combinations(sorted_data, 2)) print('sorted with conditional: ', len(g)) %timeit test2() # sorted with conditional: 49995000 # sorted with conditional: 49995000 # sorted with conditional: 49995000 # sorted with conditional: 49995000 # 1 loops, best of 3: 19.6 s per loop def test3(): g = list((i,j) for i, j in combinations(data, 2)) print('unsorted without conditional: ', len(g)) %timeit test3() # unsorted without conditional: 49995000 # unsorted without conditional: 49995000 # unsorted without conditional: 49995000 # unsorted without conditional: 49995000 # 1 loops, best of 3: 15.7 s per loop def test4(): g = list((i,j) for i, j in combinations(sorted_data, 2)) print('sorted without conditional: ', len(g)) %timeit test4() # sorted without conditional: 49995000 # sorted without conditional: 49995000 # sorted without conditional: 49995000 # sorted without conditional: 49995000 # 1 loops, best of 3: 15.3 s per loop def test5(): g = list((i,j) for i, j in combinations(reversed_sorted_data, 2)) print('reverse sorted without conditional: ', len(g)) %timeit test5() # reverse sorted without conditional: 49995000 # reverse sorted without conditional: 49995000 # reverse sorted without conditional: 49995000 # reverse sorted without conditional: 49995000 # 1 loops, best of 3: 15 s per loop ；

和13；

网友

2楼 · 编辑于 2024-04-19 18:57:22

真的不奇怪，看看combinations如何只创建i<j为真的组合，而{a2}创建所有组合，包括{}不正确的组合-if (i < j) else (j,i)在{}中是多余的。省略此签入test1将大大减少执行时间，如下所示。在

用cd4{1}检查：

shuffle done
49995000
49995000
49995000
31.66194307899991
49995000
49995000
49995000
37.66488860800018
49995000
49995000
49995000
22.706632076000005

没有i<j签入test1：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章