'Pandas : Create new column based on text values of other columns

My dataframe looks like this:

    id          text                    labels
0   447         glutamine synthetase    [protein]
1   447         GS                      [protein]
2   447         hepatoma                [indication]
3   447         NaN                      NaN
4   442         Metachromatic           [indication]

I want to transform the dataframe and create two new columns named proteins and indications that contain the text when labels is protein or indication for the same id.

Wanted output

    id          protein                     indication
0   447         glutamine synthetase, GS    hepatoma
0   442         NaN                         Metachromatic

Can someone help how to do this?



Solution 1:[1]

Use df.explode with Groupby.agg and df.pivot:

In [417]: out = df.explode('labels').groupby(['id', 'labels'])['text'].agg(','.join).reset_index().pivot('id', 'labels').reset_index().droplevel(0, axis=1).rename_axis(None, axis=1)

In [423]: out.columns = ['id', 'indication', 'protein']

In [424]: out
Out[424]: 
    id     indication                  protein
0  442  Metachromatic                      NaN
1  447       hepatoma  glutamine synthetase,GS

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1