在Python中处理巨大列表
我该如何处理一个超过一亿个字符串的大列表呢?
我应该从哪里开始处理这么庞大的列表呢?
下面是一个大列表的例子:
cards = [
"2s","3s","4s","5s","6s","7s","8s","9s","10s","Js","Qs","Ks","As"
"2h","3h","4h","5h","6h","7h","8h","9h","10h","Jh","Qh","Kh","Ah"
"2d","3d","4d","5d","6d","7d","8d","9d","10d","Jd","Qd","Kd","Ad"
"2c","3c","4c","5c","6c","7c","8c","9c","10c","Jc","Qc","Kc","Ac"
]
from itertools import combinations
cardsInHand = 7
hands = list(combinations(cards, cardsInHand))
print str(len(hands)) + " hand combinations in texas holdem poker"
5 个回答
3
还有一种不需要记忆的选择,就是使用生成器来创建数据流,你可以按照自己的方式处理这些数据。比如说:
打印出手牌的总数:
sum (1 for x in combinations(cards, 7))
打印出包含梅花A的手牌数量:
sum (1 for x in combinations(cards, 7) if 'Ac' in x)
9
如果你只是想遍历所有可能的手牌,来计算它们的数量或者找出某种特定的手牌,其实并不需要把所有手牌都存储在内存里。
你可以直接使用迭代器,而不需要把它们转换成列表:
from itertools import combinations
cardsInHand = 7
hands = combinations(cards, cardsInHand)
n = 0
for h in hands:
n += 1
# or do some other stuff here
print n, "hand combinations in texas holdem poker."
德州扑克中有85900584种手牌组合。
11
如果你有很多很多内存,Python 的列表和字符串其实是相当高效的,所以只要你的内存够用,这就不是问题。
不过,如果你存储的内容是扑克手牌,你可以想出更紧凑的表示方式。比如,你可以用一个字节来编码每一张牌,这样你只需要一个 64 位的整数就能存储一整手牌。然后你可以把这些存储在 NumPy 数组里,这样比用 Python 列表要高效得多。
举个例子:
>>> cards_to_bytes = dict((card, num) for (num, card) in enumerate(cards))
>>> import numpy as np
>>> hands = np.zeros(133784560, dtype='7int8') # 133784560 == 52c7
>>> for num, hand in enumerate(itertools.combinations(cards, 7)):
... hands[num] = [cards_to_bytes[card] for card in hand]
为了加快最后一行的速度,可以用 hands[num] = map(cards_to_bytes.__getitem__, hand)
这行代码。
这样只需要 7 * 133784560 大约 1GB 的内存…… 如果你把四张牌打包进一个字节里,这个内存需求还可以减少(我现在想不起来怎么写这个语法……)