'Python3 merge tuples in a list and remove duplicates

There might be a similar ask to this one in the past, however wasn't able to find the one that I was looking for,

Input = [('Icecream', 'Vanilla'), ('Icecream', 'Chocolate'), ('Icecream', 'Strawberry')]
Output = [('Icecream', ['Vanilla', 'Chocolate', 'Strawberry'])]

Basically, given a list of tuples, need to merge the tuples to form a tuple list without duplicates where the second element in each tuple must be a list.

The Input list might contain more items like below,

Input = [('Icecream', 'Vanilla'), ('Icecream', 'Chocolate'), ('Icecream', 'Strawberry'), ('Veggie', 'Carrot'), ('Milk', 'whole'), ('Milk', 'formula')]

Output = [('Icecream', ['Vanilla', 'Chocolate', 'Strawberry']), ('Veggie', ['Carrot']), ('Milk', ['whole', 'formula'])]


Solution 1:[1]

I think the best approach on your problem is to convert Input in to a dictionary.

Input = [('Icecream', 'Vanilla'), ('Icecream', 'Chocolate'), ('Icecream', 'Strawberry'), ('Veggie', 'Carrot'), ('Milk', 'whole'), ('Milk', 'formula')]

new_dict = {}
for item in Input:
    key, *values = item
    if key not in new_dict:
        new_dict[key] = []
    new_dict[key].append(*values)
print(new_dict)

Output:

{'Icecream': ['Vanilla', 'Chocolate', 'Strawberry'], 'Veggie': ['Carrot'], 'Milk': ['whole', 'formula']}

Using this approach, you can easily get the items you need:

icecream_flavors = new_dict["Icecream"]
print(icecream_flavors)

Output:

['Vanilla', 'Chocolate', 'Strawberry']

But if you really want a list of tuples just convert the dictionary to a list:

new_list = list(new_dict.items())
print(new_list)

Output:

[('Icecream', ['Vanilla', 'Chocolate', 'Strawberry']), ('Veggie', ['Carrot']), ('Milk', ['whole', 'formula'])]

Solution 2:[2]

Given the assumption that the first item in the tuples will always be the same, this will do what you need.

Input = [('Icecream', 'Vanilla'), ('Icecream', 'Chocolate'), ('Icecream', 'Strawberry')]

flavors = []
for i in Input:
    if i[1] not in flavors:
        flavors.append(i[1])

Output = [(Input[0][0], flavors)]

Solution 3:[3]

A defaultdict is very handy to store these type of data structures. A list in the requested output can then easily be created from it using list comprehension.

If the ordering is not important, a set would be the most logical sub type, leading to the following code.

from collections import defaultdict

Input =  = [
    ('Icecream', 'Vanilla'),
    ('Icecream', 'Chocolate'),
    ('Icecream', 'Strawberry'),
    ('Veggie', 'Carrot'),
    ('Milk', 'whole'),
    ('Milk', 'formula')
]

dd = defaultdict(set)
for key, value in Input:
    dd[key].add(value)
Output = [(key, list(values)) for key, values in dd.items()]

If ordering is important, a dictionary (with arbitrary values, such as True) could be used instead, since dictionaries are insertion ordered since Python 3.7 (or CPython 3.6):

dd = defaultdict(dict)
for key, value in Input:
    dd[key][value] = True
Output = [(key, list(values)) for key, values in dd.items()]

Output of the first is:

[('Icecream', ['Chocolate', 'Strawberry', 'Vanilla']), ('Veggie', ['Carrot']), ('Milk', ['whole', 'formula'])]

Output of the latter is identical to the output in the question:

[('Icecream', ['Vanilla', 'Chocolate', 'Strawberry']), ('Veggie', ['Carrot']), ('Milk', ['whole', 'formula'])]

Solution 4:[4]

There is 2 different solutions for this. Can elements of a tuple be swapped with each other? Like this

Input = [('Vanilla', 'Icecream'), ('Icecream', 'Chocolate'), ('Icecream', 'Strawberry')]

If no, the answer is easy. You can use a counter and a while loop (using while loop is important since for loop traverse inside a list's clone as far as I remember in python so that can be a problem) to traverse in the list, for each element, check the other elements. Let's code it.

First of all, Changing the second elements to a list is important. We want an input like this

Input = [('Icecream', ['Vanilla']), ('Icecream', ['Chocolate']), ('Icecream', ['Strawberry'])]

To do this, this should work

for tuple in Input:
    tuple[1] = [tuple[1]]

After organizing our input, now we can do what you want

counter = 0
while counter<len(Input):
    for tuple in Input[counter+1:]:
        if Input[counter][0] == tuple[0]:
            Input[counter][1].append(tuple[1][0])
            Input.remove(tuple)
counter = counter+1

I didn't try it (So there could be mistakes) but I hope you understand what I tried to do. I'm sure you can implement it in your own code with your own style.

By the way, using dictionary data type instead of list of tuples will be much easier. I suggest you to search for dictionaries.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Errol
Solution 2 Kexus
Solution 3 wovano
Solution 4 Enes Eren