从迭代器生成列表会产生意外的结果

>>> randoms = [random.randrange(10) for i in range(100)] >>> [ (x[0],list(x[1])) for x in itertools.groupby(sorted(randoms))] [(0, [0, 0, 0, 0, 0, 0, 0, 0]), (1, [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), (2, [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]), (3, [3, 3, 3, 3, 3, 3]), (4, [4, 4, 4, 4, 4, 4, 4, 4, 4, 4]), (5, [5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5]), (6, [6, 6, 6, 6, 6, 6, 6, 6, 6]), (7, [7, 7, 7, 7, 7]), (8, [8, 8, 8, 8, 8, 8, 8]), (9, [9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])] >>> [ (x[0],list(x[1])) for x in list(itertools.groupby(sorted(randoms)))] [(0, []), (1, []), (2, []), (3, []), (4, []), (5, []), (6, []), (7, []), (8, []), (9, [9])] >>> sys.version '3.3.3 (default, Dec 2 2013, 01:40:21) \n[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)]'

2条回答

网友

1楼 · 编辑于 2024-04-26 05:30:27

我认为文档中的这一点解释了问题：

“返回的组本身是一个迭代器，它与groupby（）共享基础iterable。因为源是共享的，所以当groupby（）对象处于高级时，上一个组将不再可见。因此，如果以后需要这些数据，应该将其存储为列表”

在第二个示例中，当您转换为列表时，会立即遍历所有组。但是在每个组中，您不会遍历底层元素。当您最终尝试使用list（x[1]）执行此操作时，为时已晚—您已经耗尽了迭代器。你知道吗

网友

2楼 · 编辑于 2024-04-26 05:30:27

从itertools.groupby为每个组生成的迭代器并不独立于顶级迭代。在进入下一个组之前，您需要使用它们中的每一个，否则迭代器将变得无效（它将不再产生任何结果）。你知道吗

此行为在the docs中引用：

The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list

你的两个清单说明了这一点。在第一个函数中，调用liston x[1]，这是迭代器。在第二个版本中，所有迭代器首先在list调用中围绕groupby调用生成，并且只有在遍历该列表时，内部迭代器才会被使用。请注意，最后一个组（[9]）上的迭代器确实有效！你知道吗

下面是一个简单的例子：

groupby_iter = itertools.groupby([1,1,2,2])
first_val, first_group = next(groupby_iter)

# right now, we can iterate on `first_group`:
print(next(first_group)) # prints 1

# but if we advance groupby_iter to the next group...
second_val, second_group = next(groupby_iter)

# first_group is now invalid (it won't yield the second 1)
print(next(first_group)) # raises StopIteration

相关问题更多 >

编程相关推荐

热门问题

热门文章