'Add together numbers next to duplicate strings that occur across columns for each row

I have a dataset that looks like this:

Col1	Col2	Col3	Col4
Carbon Dioxide 32	Carbon Monoxide 30	Hydrogen 12	Carbon Monoxide 12
Carbon 12	Carbon Dioxide 20	Iron 3	Carbon 6
Sulfur 81	Sulfur 20	Sulfur Dioxide 4	Sulfur 8
Sulfur 81	Sulfur 20	Carbon 4	Sulfur 8
Carbon 81	Sulfur 20	Carbon 4	Sulfur 8

For every row, if there are duplicate strings before each number across the columns, I want to add the numbers next to the text, and rearrange the table, so I get this:

Col1	Col2	Col3	Col4
Carbon Dioxide 32	Carbon Monoxide 42	Hydrogen 12	(blank)
Carbon 18	Carbon Dioxide 20	Iron 3	(blank)
Sulfur 109	Sulfur Dioxide 4	(blank)	(blank)
Sulfur 109	Carbon 4	(blank)	(blank)
Carbon 85	Sulfur 28	(blank)	(blank)

Then I want to pick the row which has the largest number next to it, so I get:

Col1
Carbon Monoxide 42
Carbon Dioxide 20
Sulfur 109
Sulfur 109
Carbon 85

Also, please note, my actual data frame is pretty big (25K+ rows) and I don't know all the possible elements/compounds beforehand.

How can achieve the above in Python?

Any help will be appreciated. Thanks in advance!

#Also, here is the python code for generating the first table above for convenience:

df = pd.DataFrame({
    'Col1': ['Carbon Dioxide 32', 'Carbon 12', 'Sulfur 81', 'Sulfur 81', 'Carbon 81'],
    'Col2': ['Carbon Monoxide 30', 'Carbon Dioxide 20', 'Sulfur 20', 'Sulfur 20', 'Sulfur 20'],
    'Col3': ['Hydrogen 12', 'Iron 3', 'Sulfur Dioxide 4', 'Carbon 4', 'Carbon 4'],
    'Col4': ['Carbon Monoxide 12', 'Carbon 6', 'Sulfur 8', 'Sulfur 8', 'Sulfur 8']
})

python-3.x pandas

Solution 1:^[1]

Use:

df = pd.DataFrame({
    'Col1': ['Carbon Dioxide 32', 'Carbon 12', 'Sulfur 81', 'Sulfur 81', 'Carbon 81'],
    'Col2': ['Carbon Monoxide 30', 'Carbon Dioxide 20', 'Sulfur 20', 'Sulfur 20', 'Sulfur 20'],
    'Col3': ['Hydrogen 12', 'Iron 3', 'Sulfur Dioxide 4', 'Carbon 4', 'Carbon 4'],
    'Col4': ['Carbon Monoxide 12', 'Carbon 6', 'Sulfur 8', 'Sulfur 8', 'Sulfur 8']
})
temp = df.agg(lambda x: (x.str.split().str[:-1].str.join(' ').values, x.str.split().str[-1].values)).T.explode([0,1])
temp.index = np.array([f"c{i}" for i in range(int(len(temp)/4))]*4).flatten()
temp[1]=temp[1].astype(int)
temp2 = temp.groupby([temp.index, 0])[1].sum()
temp2.max(level=1)
temp2.reset_index().sort_values(1, ascending = False).groupby('level_0')[[0,1]].apply(lambda x: x.iloc[0])

Output:

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	keramat

'Add together numbers next to duplicate strings that occur across columns for each row

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]