在Python中处理巨大列表

10 投票
5 回答
40003 浏览
提问于 2025-04-17 18:20

我该如何处理一个超过一亿个字符串的大列表呢?

我应该从哪里开始处理这么庞大的列表呢?

下面是一个大列表的例子:

cards = [
            "2s","3s","4s","5s","6s","7s","8s","9s","10s","Js","Qs","Ks","As"
            "2h","3h","4h","5h","6h","7h","8h","9h","10h","Jh","Qh","Kh","Ah"
            "2d","3d","4d","5d","6d","7d","8d","9d","10d","Jd","Qd","Kd","Ad"
            "2c","3c","4c","5c","6c","7c","8c","9c","10c","Jc","Qc","Kc","Ac"
           ]

from itertools import combinations

cardsInHand = 7
hands = list(combinations(cards,  cardsInHand))

print str(len(hands)) + " hand combinations in texas holdem poker"

5 个回答

3

还有一种不需要记忆的选择,就是使用生成器来创建数据流,你可以按照自己的方式处理这些数据。比如说:

打印出手牌的总数:

sum (1 for x in combinations(cards, 7))

打印出包含梅花A的手牌数量:

sum (1 for x in combinations(cards, 7) if 'Ac' in x)
9

如果你只是想遍历所有可能的手牌,来计算它们的数量或者找出某种特定的手牌,其实并不需要把所有手牌都存储在内存里

你可以直接使用迭代器,而不需要把它们转换成列表:

from itertools import combinations

cardsInHand = 7
hands = combinations(cards,  cardsInHand)

n = 0
for h in hands:
    n += 1
    # or do some other stuff here

print n, "hand combinations in texas holdem poker."

德州扑克中有85900584种手牌组合。

11

如果你有很多很多内存,Python 的列表和字符串其实是相当高效的,所以只要你的内存够用,这就不是问题。

不过,如果你存储的内容是扑克手牌,你可以想出更紧凑的表示方式。比如,你可以用一个字节来编码每一张牌,这样你只需要一个 64 位的整数就能存储一整手牌。然后你可以把这些存储在 NumPy 数组里,这样比用 Python 列表要高效得多。

举个例子:

>>> cards_to_bytes = dict((card, num) for (num, card) in enumerate(cards))
>>> import numpy as np
>>> hands = np.zeros(133784560, dtype='7int8') # 133784560 == 52c7
>>> for num, hand in enumerate(itertools.combinations(cards, 7)):
...     hands[num] = [cards_to_bytes[card] for card in hand]

为了加快最后一行的速度,可以用 hands[num] = map(cards_to_bytes.__getitem__, hand) 这行代码。

这样只需要 7 * 133784560 大约 1GB 的内存…… 如果你把四张牌打包进一个字节里,这个内存需求还可以减少(我现在想不起来怎么写这个语法……)

撰写回答