如何在Python中按排序顺序从排序迭代器中生成?
有没有更好的方法可以把一堆已经排好序的迭代器合并成一个,这样它输出的内容也是有序的?我觉得下面的代码可以实现这个功能,但我感觉还有更简洁、更优雅的方法我没有想到。
def sortIters(*iterables, **kwargs):
key = kwargs.get('key', lambda x : x)
nextElems = {}
currentKey = None
for g in iterables:
try:
nextElems[g] = g.next()
k = key(nextElems[g])
if currentKey is None or k < currentKey:
currentKey = k
except StopIteration:
pass #iterator was empty
while nextElems:
minKey = None
stoppedIters = set()
for g, item in nextElems.iteritems():
k = key(item)
if k == currentKey:
yield item
try:
nextElems[g] = g.next()
except StopIteration:
stoppedIters.add(g)
minKey = k if minKey is None else min(k, minKey)
currentKey = minKey
for g in stoppedIters:
del nextElems[g]
我的需求是,我有很多个csv文件需要根据某个排序字段来合并。因为这些文件比较大,我不想把它们全部读进一个列表里再去排序。我现在用的是python2.6,不过如果有python3的解决方案,我也很想看看。
1 个回答
29
是的,你需要用到 heapq.merge()
这个功能,它的作用很简单,就是按顺序遍历多个已经排好序的迭代器。
def sortkey(row):
return (row[5], row)
def unwrap(key):
sortkey, row = key
return row
from itertools import imap
FILE_LIST = map(file, ['foo.csv', 'bar.csv'])
input_iters = imap(sortkey, map(csv.csvreader, FILE_LIST))
output_iter = imap(unwrap, heapq.merge(*input_iters))