如何为Python迭代器编写分页器？

Question

我想找一种方法来“分页”一个Python的迭代器。也就是说，我想把一个给定的迭代器iter和page_size包裹起来，形成另一个迭代器，这个迭代器会把iter中的项目分成一系列的“页面”。每一页本身也是一个迭代器，最多包含page_size个项目。

我查阅了itertools，发现最接近的功能是itertools.islice。在某种程度上，我想要的正好是itertools.chain的反向操作——我不是想把一系列的迭代器连接成一个，而是想把一个迭代器拆分成一系列更小的迭代器。我原本期待在itertools中找到一个分页的功能，但没有找到。

我想出了以下的分页类和示例。

class pager(object):
    """
    takes the iterable iter and page_size to create an iterator that "pages through" iter.  That is, pager returns a series of page iterators,
    each returning up to page_size items from iter.
    """
    def __init__(self,iter, page_size):
        self.iter = iter
        self.page_size = page_size
    def __iter__(self):
        return self
    def next(self):
        # if self.iter has not been exhausted, return the next slice
        # I'm using a technique from 
        # https://stackoverflow.com/questions/1264319/need-to-add-an-element-at-the-start-of-an-iterator-in-python
        # to check for iterator completion by cloning self.iter into 3 copies:
        # 1) self.iter gets advanced to the next page
        # 2) peek is used to check on whether self.iter is done
        # 3) iter_for_return is to create an independent page of the iterator to be used by caller of pager
        self.iter, peek, iter_for_return = itertools.tee(self.iter, 3)
        try:
            next_v = next(peek)
        except StopIteration: # catch the exception and then raise it
            raise StopIteration
        else:
            # consume the page from the iterator so that the next page is up in the next iteration
            # is there a better way to do this?
            # 
            for i in itertools.islice(self.iter,self.page_size): pass
            return itertools.islice(iter_for_return,self.page_size)



iterator_size = 10
page_size = 3

my_pager = pager(xrange(iterator_size),page_size)

# skip a page, then print out rest, and then show the first page
page1 = my_pager.next()

for page in my_pager:
    for i in page:
        print i
    print "----"

print "skipped first page: " , list(page1)

我希望能得到一些反馈，并有以下几个问题：

在itertools中是否已经有一个分页器，我却没有注意到？
把self.iter克隆三次对我来说感觉有点笨拙。一个克隆是为了检查self.iter是否还有更多项目。我决定采用Alex Martelli建议的一个技巧（我知道他提到过一种包装技巧）。第二个克隆是为了让返回的页面独立于内部迭代器（self.iter）。有没有办法避免克隆三次？
除了捕获StopIteration异常后再抛出，还有没有更好的处理方式？我有点想不捕获它，让它自然抛出。

谢谢！

-Raymond

代码优化迭代器生成器设计模式 itertools 迭代 stopiteration 分页器

如何为Python迭代器编写分页器？

6 个回答

撰写回答