'Add together numbers next to duplicate strings that occur across columns for each row

I have a dataset that looks like this:

Col1 Col2 Col3 Col4
Carbon Dioxide 32 Carbon Monoxide 30 Hydrogen 12 Carbon Monoxide 12
Carbon 12 Carbon Dioxide 20 Iron 3 Carbon 6
Sulfur 81 Sulfur 20 Sulfur Dioxide 4 Sulfur 8
Sulfur 81 Sulfur 20 Carbon 4 Sulfur 8
Carbon 81 Sulfur 20 Carbon 4 Sulfur 8

For every row, if there are duplicate strings before each number across the columns, I want to add the numbers next to the text, and rearrange the table, so I get this:

Col1 Col2 Col3 Col4
Carbon Dioxide 32 Carbon Monoxide 42 Hydrogen 12 (blank)
Carbon 18 Carbon Dioxide 20 Iron 3 (blank)
Sulfur 109 Sulfur Dioxide 4 (blank) (blank)
Sulfur 109 Carbon 4 (blank) (blank)
Carbon 85 Sulfur 28 (blank) (blank)

Then I want to pick the row which has the largest number next to it, so I get:

Col1
Carbon Monoxide 42
Carbon Dioxide 20
Sulfur 109
Sulfur 109
Carbon 85

Also, please note, my actual data frame is pretty big (25K+ rows) and I don't know all the possible elements/compounds beforehand.

How can achieve the above in Python?

Any help will be appreciated. Thanks in advance!

#Also, here is the python code for generating the first table above for convenience:

df = pd.DataFrame({
    'Col1': ['Carbon Dioxide 32', 'Carbon 12', 'Sulfur 81', 'Sulfur 81', 'Carbon 81'],
    'Col2': ['Carbon Monoxide 30', 'Carbon Dioxide 20', 'Sulfur 20', 'Sulfur 20', 'Sulfur 20'],
    'Col3': ['Hydrogen 12', 'Iron 3', 'Sulfur Dioxide 4', 'Carbon 4', 'Carbon 4'],
    'Col4': ['Carbon Monoxide 12', 'Carbon 6', 'Sulfur 8', 'Sulfur 8', 'Sulfur 8']
})


Solution 1:[1]

Use:

df = pd.DataFrame({
    'Col1': ['Carbon Dioxide 32', 'Carbon 12', 'Sulfur 81', 'Sulfur 81', 'Carbon 81'],
    'Col2': ['Carbon Monoxide 30', 'Carbon Dioxide 20', 'Sulfur 20', 'Sulfur 20', 'Sulfur 20'],
    'Col3': ['Hydrogen 12', 'Iron 3', 'Sulfur Dioxide 4', 'Carbon 4', 'Carbon 4'],
    'Col4': ['Carbon Monoxide 12', 'Carbon 6', 'Sulfur 8', 'Sulfur 8', 'Sulfur 8']
})
temp = df.agg(lambda x: (x.str.split().str[:-1].str.join(' ').values, x.str.split().str[-1].values)).T.explode([0,1])
temp.index = np.array([f"c{i}" for i in range(int(len(temp)/4))]*4).flatten()
temp[1]=temp[1].astype(int)
temp2 = temp.groupby([temp.index, 0])[1].sum()
temp2.max(level=1)
temp2.reset_index().sort_values(1, ascending = False).groupby('level_0')[[0,1]].apply(lambda x: x.iloc[0])

Output:

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 keramat