'Iterate an iterator by chunks (of n) in Python?
Can you think of a nice way (maybe with itertools) to split an iterator into chunks of given size?
Therefore l=[1,2,3,4,5,6,7]
with chunks(l,3)
becomes an iterator [1,2,3], [4,5,6], [7]
I can think of a small program to do that but not a nice way with maybe itertools.
Solution 1:[1]
The grouper()
recipe from the itertools
documentation's recipes comes close to what you want:
def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
It will fill up the last chunk with a fill value, though.
A less general solution that only works on sequences but does handle the last chunk as desired is
[my_list[i:i + chunk_size] for i in range(0, len(my_list), chunk_size)]
Finally, a solution that works on general iterators and behaves as desired is
def grouper(n, iterable):
it = iter(iterable)
while True:
chunk = tuple(itertools.islice(it, n))
if not chunk:
return
yield chunk
Solution 2:[2]
Although OP asks function to return chunks as list or tuple, in case you need to return iterators, then Sven Marnach's solution can be modified:
def grouper_it(n, iterable):
it = iter(iterable)
while True:
chunk_it = itertools.islice(it, n)
try:
first_el = next(chunk_it)
except StopIteration:
return
yield itertools.chain((first_el,), chunk_it)
Some benchmarks: http://pastebin.com/YkKFvm8b
It will be slightly more efficient only if your function iterates through elements in every chunk.
Solution 3:[3]
This will work on any iterable. It returns generator of generators (for full flexibility). I now realize that it's basically the same as @reclosedevs solution, but without the fluff. No need for try...except
as the StopIteration
propagates up, which is what we want.
The next(iterable)
call is needed to raise the StopIteration
when the iterable is empty, since islice
will continue spawning empty generators forever if you let it.
It's better because it's only two lines long, yet easy to comprehend.
def grouper(iterable, n):
while True:
yield itertools.chain((next(iterable),), itertools.islice(iterable, n-1))
Note that next(iterable)
is put into a tuple. Otherwise, if next(iterable)
itself were iterable, then itertools.chain
would flatten it out. Thanks to Jeremy Brown for pointing out this issue.
Solution 4:[4]
I was working on something today and came up with what I think is a simple solution. It is similar to jsbueno's answer, but I believe his would yield empty group
s when the length of iterable
is divisible by n
. My answer does a simple check when the iterable
is exhausted.
def chunk(iterable, chunk_size):
"""Generates lists of `chunk_size` elements from `iterable`.
>>> list(chunk((2, 3, 5, 7), 3))
[[2, 3, 5], [7]]
>>> list(chunk((2, 3, 5, 7), 2))
[[2, 3], [5, 7]]
"""
iterable = iter(iterable)
while True:
chunk = []
try:
for _ in range(chunk_size):
chunk.append(next(iterable))
yield chunk
except StopIteration:
if chunk:
yield chunk
break
Solution 5:[5]
Here's one that returns lazy chunks; use map(list, chunks(...))
if you want lists.
from itertools import islice, chain
from collections import deque
def chunks(items, n):
items = iter(items)
for first in items:
chunk = chain((first,), islice(items, n-1))
yield chunk
deque(chunk, 0)
if __name__ == "__main__":
for chunk in map(list, chunks(range(10), 3)):
print chunk
for i, chunk in enumerate(chunks(range(10), 3)):
if i % 2 == 1:
print "chunk #%d: %s" % (i, list(chunk))
else:
print "skipping #%d" % i
Solution 6:[6]
A succinct implementation is:
chunker = lambda iterable, n: (ifilterfalse(lambda x: x == (), chunk) for chunk in (izip_longest(*[iter(iterable)]*n, fillvalue=())))
This works because [iter(iterable)]*n
is a list containing the same iterator n times; zipping over that takes one item from each iterator in the list, which is the same iterator, with the result that each zip-element contains a group of n
items.
izip_longest
is needed to fully consume the underlying iterable, rather than iteration stopping when the first exhausted iterator is reached, which chops off any remainder from iterable
. This results in the need to filter out the fill-value. A slightly more robust implementation would therefore be:
def chunker(iterable, n):
class Filler(object): pass
return (ifilterfalse(lambda x: x is Filler, chunk) for chunk in (izip_longest(*[iter(iterable)]*n, fillvalue=Filler)))
This guarantees that the fill value is never an item in the underlying iterable. Using the definition above:
iterable = range(1,11)
map(tuple,chunker(iterable, 3))
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10,)]
map(tuple,chunker(iterable, 2))
[(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)]
map(tuple,chunker(iterable, 4))
[(1, 2, 3, 4), (5, 6, 7, 8), (9, 10)]
This implementation almost does what you want, but it has issues:
def chunks(it, step):
start = 0
while True:
end = start+step
yield islice(it, start, end)
start = end
(The difference is that because islice
does not raise StopIteration or anything else on calls that go beyond the end of it
this will yield forever; there is also the slightly tricky issue that the islice
results must be consumed before this generator is iterated).
To generate the moving window functionally:
izip(count(0, step), count(step, step))
So this becomes:
(it[start:end] for (start,end) in izip(count(0, step), count(step, step)))
But, that still creates an infinite iterator. So, you need takewhile (or perhaps something else might be better) to limit it:
chunk = lambda it, step: takewhile((lambda x: len(x) > 0), (it[start:end] for (start,end) in izip(count(0, step), count(step, step))))
g = chunk(range(1,11), 3)
tuple(g)
([1, 2, 3], [4, 5, 6], [7, 8, 9], [10])
Solution 7:[7]
I forget where I found the inspiration for this. I've modified it a little to work with MSI GUID's in the Windows Registry:
def nslice(s, n, truncate=False, reverse=False):
"""Splits s into n-sized chunks, optionally reversing the chunks."""
assert n > 0
while len(s) >= n:
if reverse: yield s[:n][::-1]
else: yield s[:n]
s = s[n:]
if len(s) and not truncate:
yield s
reverse
doesn't apply to your question, but it's something I use extensively with this function.
>>> [i for i in nslice([1,2,3,4,5,6,7], 3)]
[[1, 2, 3], [4, 5, 6], [7]]
>>> [i for i in nslice([1,2,3,4,5,6,7], 3, truncate=True)]
[[1, 2, 3], [4, 5, 6]]
>>> [i for i in nslice([1,2,3,4,5,6,7], 3, truncate=True, reverse=True)]
[[3, 2, 1], [6, 5, 4]]
Solution 8:[8]
Here you go.
def chunksiter(l, chunks):
i,j,n = 0,0,0
rl = []
while n < len(l)/chunks:
rl.append(l[i:j+chunks])
i+=chunks
j+=j+chunks
n+=1
return iter(rl)
def chunksiter2(l, chunks):
i,j,n = 0,0,0
while n < len(l)/chunks:
yield l[i:j+chunks]
i+=chunks
j+=j+chunks
n+=1
Examples:
for l in chunksiter([1,2,3,4,5,6,7,8],3):
print(l)
[1, 2, 3]
[4, 5, 6]
[7, 8]
for l in chunksiter2([1,2,3,4,5,6,7,8],3):
print(l)
[1, 2, 3]
[4, 5, 6]
[7, 8]
for l in chunksiter2([1,2,3,4,5,6,7,8],5):
print(l)
[1, 2, 3, 4, 5]
[6, 7, 8]
Solution 9:[9]
"Simpler is better than complex" - a straightforward generator a few lines long can do the job. Just place it in some utilities module or so:
def grouper (iterable, n):
iterable = iter(iterable)
count = 0
group = []
while True:
try:
group.append(next(iterable))
count += 1
if count % n == 0:
yield group
group = []
except StopIteration:
yield group
break
Solution 10:[10]
Since python 3.8, there is a simpler solution using the :=
operator:
def grouper(it: Iterator, n: int) -> Iterator[list]:
while chunck := list(itertools.islice(it, n)):
yield chunck
usage:
>>> list(grouper(iter('ABCDEFG'), 3))
[['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow