Python: 快速大整数键字典

3 投票

4 回答

1651 浏览

提问于 2025-04-16 17:07

我有一个超过10,000个整数的列表。这些整数的值可以非常大，最高可以达到10的27次方。现在我想创建所有这些整数的配对，并计算它们的和。接着，我想找出和相同的不同配对。

举个例子：

l[0] = 4
l[1] = 3
l[2] = 6
l[3] = 1
...

pairs[10] = [(0,2)] # 10 is the sum of the values of l[0] and l[2]
pairs[7] = [(0,1), (2,3)] # 7 is the sum of the values of l[0] and l[1] or l[2] and l[3]
pairs[5] = [(0,3)]
pairs[9] = [(1,2)]
...

我想要的就是pairs[7]的内容。它给了我两个和相同的配对。

我已经这样实现了——但我想知道是否可以做得更快。目前，对于10,000个项目，在一台快速的机器上需要超过6个小时。（正如我所说，l的值以及pairs的键都是最高可达10的27次方的整数。）

l = [4,3,6,1]
pairs = {}
for i in range( len( l  )  ):
    for j in range(i+1, len( l ) ):
        s = l[i] + l[j]
        if not s in pairs:
            pairs[s] = []
        pairs[s].append((i,j))

# pairs = {9: [(1, 2)], 10: [(0, 2)], 4: [(1, 3)], 5: [(0, 3)], 7: [(0, 1), (2, 3)]}

编辑：我想补充一些背景信息，正如Simon Stelling所要求的。

我的目标是找到形式上的类比，比如

lays : laid :: says : said

在一个单词列表中，比如

[ lays, lay, laid, says, said, foo, bar ... ]

我已经有一个函数analogy(a,b,c,d)，如果a : b :: c : d，它会返回True。不过，我需要检查从列表中创建的所有可能的四元组，这样的复杂度大约是O((n^4)/2)。

作为预过滤，我想使用字符计数属性。这个属性表示在(a,d)和(b,c)中，每个字符的数量是相同的。例如，在“layssaid”中，我们有2个a，而在“laidsays”中也是如此。

所以到目前为止的想法是：

为每个单词创建一个“字符计数向量”，并将其表示为一个整数（列表中的项目l）
在pairs中创建所有配对，看看是否有“配对簇”，也就是说，是否有多个配对对应于特定的字符计数向量和。

这个方法是有效的，只是速度比较慢。复杂度降到了大约O((n^2)/2)，但这仍然很高，尤其是字典的查找和插入操作频繁。

数据结构算法效率组合数学字符计数大整数整数字典复杂度优化配对计算

4 个回答

给你一个小提示。可以看看 itertools.combinations。

这可能不是你想要的（因为它存储的是值的组合，而不是索引的组合），但可以作为一个起始代码：

from itertools import combinations
for (a, b) in combinations(l, 2):
    pairs.setdefault(a + b, []).append((a, b))

回答于 2025-04-16 由 Python大师

分享举报

有一些简单的优化方法，比如把常量值存储在一个本地变量中，或者用 xrange 代替 range：

pairs = {}
len_l = len(l)
for i in xrange(len_l):
    for j in xrange(i+1, len_l):
        s = l[i] + l[j]
        res = pairs.setdefault(s, [])
        res.append((i,j))

不过，更聪明的做法可能是不要提前计算出列表，而是从概念上优化你的方法。你真正想要达到的目标是什么？你真的只是想计算你正在做的事情吗？还是说你打算把这个结果用在其他地方？那其他的地方是什么呢？

回答于 2025-04-16 由 Python大师

分享举报

最后，我想出了自己的解决办法，平均只用了原来一半的计算时间。

基本思路是：与其在不断扩大的字典中读写 n^2 次，不如先把所有的和收集到一个列表里。然后对这个列表进行排序。接着，在排序后的列表中查找相邻的相同项。

下面是代码：

from operator import itemgetter

def getPairClusters( l ):

    # first, we just store all possible pairs sequentially
    # clustering will happen later
    pairs = []

    for i in xrange( len( l)  ):
        for j in xrange(i+1, len( l ) ):
            pair = l[i] + l[j]
            pairs.append( ( pair, i, j ) )
    pairs.sort(key=itemgetter(0))

    # pairs = [ (4, 1, 3), (5, 0, 3), (7, 0, 1), (7, 2, 3), (9, 1, 2), (10, 0, 2)]
    # a list item of pairs now contains a tuple (like (4, 1, 3)) with
    # * the sum of two l items: 4
    # * the index of the two l items: 1, 3

    # now clustering starts
    # we want to find neighbouring items as
    # (7, 0, 1), (7, 2, 3)
    # (since 7=7)

    pairClusters = []

    # flag if we are within a cluster
    # while iterating over pairs list
    withinCluster = False

            # iterate over pair list
    for i in xrange(len(pairs)-1):
        if not withinCluster:
            if pairs[i][0] == pairs[i+1][0]:
                # if not within a cluster
                # and found 2 neighbouring same numbers:
                # init new cluster
                pairCluster = [ ( pairs[i][1], pairs[i][2] ) ]
                withinCluster = True
        else:
            # if still within cluster
            if pairs[i][0] == pairs[i+1][0]:
                pairCluster.append( ( pairs[i][1], pairs[i][2] ) )
            # else cluster has ended
            # (next neighbouring item has different number)
            else:
                pairCluster.append( ( pairs[i][1], pairs[i][2] ) )
                pairClusters.append(pairCluster)
                withinCluster = False

    return pairClusters

l = [4,3,6,1]

print getPairClusters(l)

回答于 2025-04-16 由 Python大师

分享举报

Python: 快速大整数键字典

4 个回答

撰写回答