'zip_longest for the left list always
I know about the zip
function (which will zip according to the shortest list) and zip_longest
(which will zip according to the longest list), but how would I zip according to the first list, regardless of whether it's the longest or not?
For example:
Input: ['a', 'b', 'c'], [1, 2]
Output: [('a', 1), ('b', 2), ('c', None)]
But also:
Input: ['a', 'b'], [1, 2, 3]
Output: [('a', 1), ('b', 2)]
Do both of these functionalities exist in one function?
Solution 1:[1]
Solutions
Chaining the repeated fillvalue behind the iterables other than the first:
from itertools import chain, repeat
def zip_first(first, *rest, fillvalue=None):
return zip(first, *map(chain, rest, repeat(repeat(fillvalue))))
Or using zip_longest
and trim it with a compress
and zip
trick:
def zip_first(first, *rest, fillvalue=None):
a, b = tee(first)
return compress(zip_longest(b, *rest, fillvalue=fillvalue), zip(a))
Just like zip
and zip_longest
, these take any number (well, at least one) of any kind of iterables (including infinite ones) and return an iterator (convert to list if needed).
Benchmark results
Benchmarks with other equally general solutions (all code is at the end of the answer):
10 iterables of 10,000 to 90,000 elements, first has 50,000:
????????????????????????????????????????????????????????????
2.2 ms 2.2 ms 2.3 ms limit_cheat
2.6 ms 2.6 ms 2.6 ms Kelly_Bundy_chain
3.3 ms 3.3 ms 3.3 ms Kelly_Bundy_compress
50.2 ms 50.6 ms 50.7 ms CrazyChucky
54.7 ms 55.0 ms 55.0 ms Sven_Marnach
74.8 ms 74.9 ms 75.0 ms Mad_Physicist
5.4 ms 5.4 ms 5.4 ms Kelly_Bundy_3
5.9 ms 6.0 ms 6.0 ms Kelly_Bundy_4
4.6 ms 4.7 ms 4.7 ms Kelly_Bundy_5
10,000 iterables of 0 to 100 elements, first has 50:
????????????????????????????????????????????????????
4.6 ms 4.7 ms 4.8 ms limit_cheat
4.8 ms 4.8 ms 4.8 ms Kelly_Bundy_compress
8.4 ms 8.4 ms 8.4 ms Kelly_Bundy_chain
27.1 ms 27.3 ms 27.5 ms CrazyChucky
38.3 ms 38.5 ms 38.7 ms Sven_Marnach
73.0 ms 73.0 ms 73.1 ms Mad_Physicist
4.9 ms 4.9 ms 5.0 ms Kelly_Bundy_3
4.9 ms 4.9 ms 5.0 ms Kelly_Bundy_4
5.0 ms 5.0 ms 5.0 ms Kelly_Bundy_5
The first one is a cheat that knows the length, included to show what's probably a limit for how fast we can get.
Explanations
A little explanation of the above two solutions:
The first solution, if used with for example three iterables, is equivalent to this:
def zip_first(first, second, third, fillvalue=None):
filler = repeat(fillvalue)
return zip(first,
chain(second, filler),
chain(third, filler))
The second solution basically lets zip_longest
do the job. The only problem with that is that it doesn't stop when the first iterable is done. So I duplicate the first iterable (with tee
) and then use one for its elements and the other for its length. The zip(a)
wraps every element in a 1-tuple, and non-empty tuples are true. So compress
gives me all tuples produced by zip_longest
, as many as there are elements in the first iterable.
Benchmark code (Try it online!)
def limit_cheat(*iterables, fillvalue=None):
return islice(zip_longest(*iterables, fillvalue=fillvalue), cheat_length)
def Kelly_Bundy_chain(first, *rest, fillvalue=None):
return zip(first, *map(chain, rest, repeat(repeat(fillvalue))))
def Kelly_Bundy_compress(first, *rest, fillvalue=None):
a, b = tee(first)
return compress(zip_longest(b, *rest, fillvalue=fillvalue), zip(a))
def CrazyChucky(*iterables, fillvalue=None):
SENTINEL = object()
for first, *others in zip_longest(*iterables, fillvalue=SENTINEL):
if first is SENTINEL:
return
others = [i if i is not SENTINEL else fillvalue for i in others]
yield (first, *others)
def Sven_Marnach(first, *rest, fillvalue=None):
rest = [iter(r) for r in rest]
for x in first:
yield x, *(next(r, fillvalue) for r in rest)
def Mad_Physicist(*args, fillvalue=None):
# zip_by_first('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
# zip_by_first('ABC', 'xyzw', fillvalue='-') --> Ax By Cz
if not args:
return
iterators = [iter(it) for it in args]
while True:
values = []
for i, it in enumerate(iterators):
try:
value = next(it)
except StopIteration:
if i == 0:
return
iterators[i] = repeat(fillvalue)
value = fillvalue
values.append(value)
yield tuple(values)
def Kelly_Bundy_3(first, *rest, fillvalue=None):
a, b = tee(first)
return map(itemgetter(1), zip(a, zip_longest(b, *rest, fillvalue=fillvalue)))
def Kelly_Bundy_4(first, *rest, fillvalue=None):
sentinel = object()
for z in zip_longest(chain(first, [sentinel]), *rest, fillvalue=fillvalue):
if z[0] is sentinel:
break
yield z
def Kelly_Bundy_5(first, *rest, fillvalue=None):
stopped = False
def stop():
nonlocal stopped
stopped = True
return
yield
for z in zip_longest(chain(first, stop()), *rest, fillvalue=fillvalue):
if stopped:
break
yield z
import timeit
from itertools import chain, repeat, zip_longest, islice, tee, compress
from operator import itemgetter
from collections import deque
funcs = [
limit_cheat,
Kelly_Bundy_chain,
Kelly_Bundy_compress,
CrazyChucky,
Sven_Marnach,
Mad_Physicist,
Kelly_Bundy_3,
Kelly_Bundy_4,
Kelly_Bundy_5,
]
def test(args_creator):
# Correctness
expect = list(funcs[0](*args_creator()))
for func in funcs:
result = list(func(*args_creator()))
print(result == expect, func.__name__)
# Speed
tss = [[] for _ in funcs]
for _ in range(5):
print()
print(args_creator.__name__)
for func, ts in zip(funcs, tss):
t = min(timeit.repeat(lambda: deque(func(*args_creator()), 0), number=1))
ts.append(t)
print(*('%4.1f ms ' % (t * 1e3) for t in sorted(ts)[:3]), func.__name__)
def args_few_but_long_iterables():
global cheat_length
cheat_length = 50_000
first = repeat(0, 50_000)
rest = [repeat(i, 10_000 * i) for i in range(1, 10)]
return first, *rest
def args_many_but_short_iterables():
global cheat_length
cheat_length = 50
first = repeat(0, 50)
rest = [repeat(i, i % 101) for i in range(1, 10_000)]
return first, *rest
test(args_few_but_long_iterables)
funcs[1:3] = funcs[1:3][::-1]
test(args_many_but_short_iterables)
Solution 2:[2]
You can repurpose the "roughly equivalent" python code shown in the docs for itertools.zip_longest
to make a generalized version that zips according to the length of the first argument:
from itertools import repeat
def zip_by_first(*args, fillvalue=None):
# zip_by_first('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
# zip_by_first('ABC', 'xyzw', fillvalue='-') --> Ax By Cz
if not args:
return
iterators = [iter(it) for it in args]
while True:
values = []
for i, it in enumerate(iterators):
try:
value = next(it)
except StopIteration:
if i == 0:
return
iterators[i] = repeat(fillvalue)
value = fillvalue
values.append(value)
yield tuple(values)
You might be able to make some small improvements like caching repeat(fillvalue)
or so. The issue with this implementation is that it's written in Python, while most of itertools
uses a much faster C implementation. You can see the effects of this by comparing against Kelly Bundy's answer.
Solution 3:[3]
Here's another take, if the goal is readable, easy to understand code:
def zip_first(first, *rest, fillvalue=None):
rest = [iter(r) for r in rest]
for x in first:
yield x, *(next(r, fillvalue) for r in rest)
This uses the two-argument form of next()
to return the fill value for all iterables that are exhausted.
For exactly two iterables, this can be simplified to
def zip_first(first, second, fillvalue=None):
second = iter(second)
for x in first:
yield x, next(second, fillvalue)
Solution 4:[4]
Make the second one infinite, and then just use normal zip:
from itertools import chain, repeat
a = ['a', 'b', 'c']
b = [1, 2]
b = chain(b, repeat(None))
print(*zip(a, b))
Solution 5:[5]
Return only len(a)
elements from zip_longest
:
from itertools import zip_longest
def zip_first(a, b):
z = zip_longest(a, b)
for i, r in zip(range(len(a)), z):
yield r
Solution 6:[6]
A little bit ugly, but I would go with this one. The idea is to shorten the second list to the size of the first one if it is longer. Then we use zip_longest
guaranteeing that the result is at least as long as the first argument of zip
.
import itertools
input1 = [['a', 'b', 'c'], [1, 2]]
input2 = [['a', 'b'], [1, 2, 3]]
zip1 = itertools.zip_longest(input1[0], input1[1][:len(input1[0])])
zip2 = itertools.zip_longest(input2[0], input2[1][:len(input2[0])])
print(list(zip1))
print(list(zip2))
Output:
[('a', 1), ('b', 2), ('c', None)]
[('a', 1), ('b', 2)]
To zip multiple lists this can be used:
import itertools
def zip_first(lists):
equal_lists = [l[:len(lists[0])] for l in lists]
return itertools.zip_longest(*equal_lists)
Solution 7:[7]
For generic iterators (or lists as well), you can use this. We yield pairs until we hit StopIteration
on a
. If we hit StopIteration
on b
first, we use None
as the second value.
def zip_first(a, b):
ai, bi = iter(a), iter(b)
while True:
try:
aa = next(ai)
except StopIteration:
return
try:
bb = next(bi)
except StopIteration:
bb = None
yield aa, bb
Solution 8:[8]
If the inputs are lists (or other collections which can be used with len
), you can use zip_longest
and lazily limit the result to the length of the first list1, by using islice
:
from itertools import islice, zip_longest
def zip_first(a, b):
return islice(zip_longest(a, b), len(a))
1This basic idea was taken from the answer by Jan Christoph Terasa.
Solution 9:[9]
idk but
first = ['a', 'b', 'c']
last = [1, 2, 3, 4]
if len(first) < len(last):
b = list(zip(first, last))
else:
b = list(zip_longest(first, last))
print(b)
Solution 10:[10]
I don't know of one readymade, but you can define your own.
Using object()
as a sentinel ensures it will always test as unique, and never get confused with None
or any other fill value. Thus this should behave properly even if either of your iterables contain None
.
Like zip_longest
, it takes any number of iterables (not necessarily two), and you can specify the fillvalue
.
from itertools import zip_longest
def zip_left(*iterables, fillvalue=None):
SENTINEL = object()
for first, *others in zip_longest(*iterables, fillvalue=SENTINEL):
if first is SENTINEL:
return
others = [i if i is not SENTINEL else fillvalue for i in others]
yield (first, *others)
print(list(zip_left(['a', 'b', 'c'], [1, 2])))
print(list(zip_left(['a', 'b'], [1, 2, 3])))
Output:
[('a', 1), ('b', 2), ('c', None)]
[('a', 1), ('b', 2)]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | |
Solution 3 | |
Solution 4 | wim |
Solution 5 | |
Solution 6 | |
Solution 7 | |
Solution 8 | mkrieger1 |
Solution 9 | Dizaster |
Solution 10 |