'Pandas : Create new column based on text values of other columns
My dataframe looks like this:
id text labels
0 447 glutamine synthetase [protein]
1 447 GS [protein]
2 447 hepatoma [indication]
3 447 NaN NaN
4 442 Metachromatic [indication]
I want to transform the dataframe and create two new columns named proteins
and indications
that contain the text
when labels
is protein or indication for the same id.
Wanted output
id protein indication
0 447 glutamine synthetase, GS hepatoma
0 442 NaN Metachromatic
Can someone help how to do this?
Solution 1:[1]
Use df.explode
with Groupby.agg
and df.pivot
:
In [417]: out = df.explode('labels').groupby(['id', 'labels'])['text'].agg(','.join).reset_index().pivot('id', 'labels').reset_index().droplevel(0, axis=1).rename_axis(None, axis=1)
In [423]: out.columns = ['id', 'indication', 'protein']
In [424]: out
Out[424]:
id indication protein
0 442 Metachromatic NaN
1 447 hepatoma glutamine synthetase,GS
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |