'zip_longest for the left list always

I know about the zip function (which will zip according to the shortest list) and zip_longest (which will zip according to the longest list), but how would I zip according to the first list, regardless of whether it's the longest or not?

For example:

Input:  ['a', 'b', 'c'], [1, 2]
Output: [('a', 1), ('b', 2), ('c', None)]

But also:

Input:  ['a', 'b'], [1, 2, 3]
Output: [('a', 1), ('b', 2)]

Do both of these functionalities exist in one function?



Solution 1:[1]

Solutions

Chaining the repeated fillvalue behind the iterables other than the first:

from itertools import chain, repeat

def zip_first(first, *rest, fillvalue=None):
    return zip(first, *map(chain, rest, repeat(repeat(fillvalue))))

Or using zip_longest and trim it with a compress and zip trick:

def zip_first(first, *rest, fillvalue=None):
    a, b = tee(first)
    return compress(zip_longest(b, *rest, fillvalue=fillvalue), zip(a))

Just like zip and zip_longest, these take any number (well, at least one) of any kind of iterables (including infinite ones) and return an iterator (convert to list if needed).

Benchmark results

Benchmarks with other equally general solutions (all code is at the end of the answer):

10 iterables of 10,000 to 90,000 elements, first has 50,000:
????????????????????????????????????????????????????????????
 2.2 ms   2.2 ms   2.3 ms  limit_cheat
 2.6 ms   2.6 ms   2.6 ms  Kelly_Bundy_chain
 3.3 ms   3.3 ms   3.3 ms  Kelly_Bundy_compress
50.2 ms  50.6 ms  50.7 ms  CrazyChucky
54.7 ms  55.0 ms  55.0 ms  Sven_Marnach
74.8 ms  74.9 ms  75.0 ms  Mad_Physicist
 5.4 ms   5.4 ms   5.4 ms  Kelly_Bundy_3
 5.9 ms   6.0 ms   6.0 ms  Kelly_Bundy_4
 4.6 ms   4.7 ms   4.7 ms  Kelly_Bundy_5

10,000 iterables of 0 to 100 elements, first has 50:
????????????????????????????????????????????????????
 4.6 ms   4.7 ms   4.8 ms  limit_cheat
 4.8 ms   4.8 ms   4.8 ms  Kelly_Bundy_compress
 8.4 ms   8.4 ms   8.4 ms  Kelly_Bundy_chain
27.1 ms  27.3 ms  27.5 ms  CrazyChucky
38.3 ms  38.5 ms  38.7 ms  Sven_Marnach
73.0 ms  73.0 ms  73.1 ms  Mad_Physicist
 4.9 ms   4.9 ms   5.0 ms  Kelly_Bundy_3
 4.9 ms   4.9 ms   5.0 ms  Kelly_Bundy_4
 5.0 ms   5.0 ms   5.0 ms  Kelly_Bundy_5

The first one is a cheat that knows the length, included to show what's probably a limit for how fast we can get.

Explanations

A little explanation of the above two solutions:

The first solution, if used with for example three iterables, is equivalent to this:

def zip_first(first, second, third, fillvalue=None):
    filler = repeat(fillvalue)
    return zip(first,
               chain(second, filler),
               chain(third, filler))

The second solution basically lets zip_longest do the job. The only problem with that is that it doesn't stop when the first iterable is done. So I duplicate the first iterable (with tee) and then use one for its elements and the other for its length. The zip(a) wraps every element in a 1-tuple, and non-empty tuples are true. So compress gives me all tuples produced by zip_longest, as many as there are elements in the first iterable.

Benchmark code (Try it online!)

def limit_cheat(*iterables, fillvalue=None):
    return islice(zip_longest(*iterables, fillvalue=fillvalue), cheat_length)

def Kelly_Bundy_chain(first, *rest, fillvalue=None):
    return zip(first, *map(chain, rest, repeat(repeat(fillvalue))))

def Kelly_Bundy_compress(first, *rest, fillvalue=None):
    a, b = tee(first)
    return compress(zip_longest(b, *rest, fillvalue=fillvalue), zip(a))

def CrazyChucky(*iterables, fillvalue=None):
    SENTINEL = object()
    
    for first, *others in zip_longest(*iterables, fillvalue=SENTINEL):
        if first is SENTINEL:
            return
        others = [i if i is not SENTINEL else fillvalue for i in others]
        yield (first, *others)

def Sven_Marnach(first, *rest, fillvalue=None):
    rest = [iter(r) for r in rest]
    for x in first:
        yield x, *(next(r, fillvalue) for r in rest)

def Mad_Physicist(*args, fillvalue=None):
    # zip_by_first('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
    # zip_by_first('ABC', 'xyzw', fillvalue='-') --> Ax By Cz
    if not args:
        return
    iterators = [iter(it) for it in args]
    while True:
        values = []
        for i, it in enumerate(iterators):
            try:
                value = next(it)
            except StopIteration:
                if i == 0:
                    return
                iterators[i] = repeat(fillvalue)
                value = fillvalue
            values.append(value)
        yield tuple(values)

def Kelly_Bundy_3(first, *rest, fillvalue=None):
    a, b = tee(first)
    return map(itemgetter(1), zip(a, zip_longest(b, *rest, fillvalue=fillvalue)))

def Kelly_Bundy_4(first, *rest, fillvalue=None):
    sentinel = object()
    for z in zip_longest(chain(first, [sentinel]), *rest, fillvalue=fillvalue):
        if z[0] is sentinel:
            break
        yield z

def Kelly_Bundy_5(first, *rest, fillvalue=None):
    stopped = False
    def stop():
        nonlocal stopped
        stopped = True
        return
        yield
    for z in zip_longest(chain(first, stop()), *rest, fillvalue=fillvalue):
        if stopped:
            break
        yield z


import timeit
from itertools import chain, repeat, zip_longest, islice, tee, compress
from operator import itemgetter
from collections import deque

funcs = [
    limit_cheat,
    Kelly_Bundy_chain,
    Kelly_Bundy_compress,
    CrazyChucky,
    Sven_Marnach,
    Mad_Physicist,
    Kelly_Bundy_3,
    Kelly_Bundy_4,
    Kelly_Bundy_5,
]

def test(args_creator):

    # Correctness
    expect = list(funcs[0](*args_creator()))
    for func in funcs:
        result = list(func(*args_creator()))
        print(result == expect, func.__name__)
    
    # Speed
    tss = [[] for _ in funcs]
    for _ in range(5):
        print()
        print(args_creator.__name__)
        for func, ts in zip(funcs, tss):
            t = min(timeit.repeat(lambda: deque(func(*args_creator()), 0), number=1))
            ts.append(t)
            print(*('%4.1f ms ' % (t * 1e3) for t in sorted(ts)[:3]), func.__name__)

def args_few_but_long_iterables():
    global cheat_length
    cheat_length = 50_000
    first = repeat(0, 50_000)
    rest = [repeat(i, 10_000 * i) for i in range(1, 10)]
    return first, *rest

def args_many_but_short_iterables():
    global cheat_length
    cheat_length = 50
    first = repeat(0, 50)
    rest = [repeat(i, i % 101) for i in range(1, 10_000)]
    return first, *rest

test(args_few_but_long_iterables)
funcs[1:3] = funcs[1:3][::-1]
test(args_many_but_short_iterables)

Solution 2:[2]

You can repurpose the "roughly equivalent" python code shown in the docs for itertools.zip_longest to make a generalized version that zips according to the length of the first argument:

from itertools import repeat

def zip_by_first(*args, fillvalue=None):
    # zip_by_first('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
    # zip_by_first('ABC', 'xyzw', fillvalue='-') --> Ax By Cz
    if not args:
        return
    iterators = [iter(it) for it in args]
    while True:
        values = []
        for i, it in enumerate(iterators):
            try:
                value = next(it)
            except StopIteration:
                if i == 0:
                    return
                iterators[i] = repeat(fillvalue)
                value = fillvalue
            values.append(value)
        yield tuple(values)

You might be able to make some small improvements like caching repeat(fillvalue) or so. The issue with this implementation is that it's written in Python, while most of itertools uses a much faster C implementation. You can see the effects of this by comparing against Kelly Bundy's answer.

Solution 3:[3]

Here's another take, if the goal is readable, easy to understand code:

def zip_first(first, *rest, fillvalue=None):
    rest = [iter(r) for r in rest]
    for x in first:
        yield x, *(next(r, fillvalue) for r in rest)

This uses the two-argument form of next() to return the fill value for all iterables that are exhausted.

For exactly two iterables, this can be simplified to

def zip_first(first, second, fillvalue=None):
    second = iter(second)
    for x in first:
        yield x, next(second, fillvalue)

Solution 4:[4]

Make the second one infinite, and then just use normal zip:

from itertools import chain, repeat

a = ['a', 'b', 'c']
b = [1, 2]

b = chain(b, repeat(None))

print(*zip(a, b))

Solution 5:[5]

Return only len(a) elements from zip_longest:

from itertools import zip_longest

def zip_first(a, b):
    z = zip_longest(a, b)
    for i, r in zip(range(len(a)), z):
        yield r

Solution 6:[6]

A little bit ugly, but I would go with this one. The idea is to shorten the second list to the size of the first one if it is longer. Then we use zip_longest guaranteeing that the result is at least as long as the first argument of zip.

import itertools

input1 = [['a', 'b', 'c'], [1, 2]]
input2 = [['a', 'b'], [1, 2, 3]]

zip1 = itertools.zip_longest(input1[0], input1[1][:len(input1[0])])
zip2 = itertools.zip_longest(input2[0], input2[1][:len(input2[0])])

print(list(zip1))
print(list(zip2))

Output:

[('a', 1), ('b', 2), ('c', None)]
[('a', 1), ('b', 2)]

To zip multiple lists this can be used:

import itertools

def zip_first(lists):
    equal_lists = [l[:len(lists[0])] for l in lists]
    return itertools.zip_longest(*equal_lists)

Solution 7:[7]

For generic iterators (or lists as well), you can use this. We yield pairs until we hit StopIteration on a. If we hit StopIteration on b first, we use None as the second value.

def zip_first(a, b):
    ai, bi = iter(a), iter(b)
    while True:
        try:
            aa = next(ai)
        except StopIteration:
            return           
        try:
            bb = next(bi)
        except StopIteration:
            bb = None
        yield aa, bb

Solution 8:[8]

If the inputs are lists (or other collections which can be used with len), you can use zip_longest and lazily limit the result to the length of the first list1, by using islice:

from itertools import islice, zip_longest

def zip_first(a, b):
    return islice(zip_longest(a, b), len(a))

1This basic idea was taken from the answer by Jan Christoph Terasa.

Solution 9:[9]

idk but

first = ['a', 'b', 'c']
last = [1, 2, 3, 4]
if len(first) < len(last):
    b = list(zip(first, last))
else:
    b = list(zip_longest(first, last))
print(b)

Solution 10:[10]

I don't know of one readymade, but you can define your own.

Using object() as a sentinel ensures it will always test as unique, and never get confused with None or any other fill value. Thus this should behave properly even if either of your iterables contain None.

Like zip_longest, it takes any number of iterables (not necessarily two), and you can specify the fillvalue.

from itertools import zip_longest

def zip_left(*iterables, fillvalue=None):
    SENTINEL = object()
    
    for first, *others in zip_longest(*iterables, fillvalue=SENTINEL):
        if first is SENTINEL:
            return
        others = [i if i is not SENTINEL else fillvalue for i in others]
        yield (first, *others)


print(list(zip_left(['a', 'b', 'c'], [1, 2])))
print(list(zip_left(['a', 'b'], [1, 2, 3])))

Output:

[('a', 1), ('b', 2), ('c', None)]
[('a', 1), ('b', 2)]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3
Solution 4 wim
Solution 5
Solution 6
Solution 7
Solution 8 mkrieger1
Solution 9 Dizaster
Solution 10