'Pandas Melt several groups of columns into multiple target columns by name
I would like to melt several groups of columns of a dataframe into multiple target columns. Similar to questions Python Pandas Melt Groups of Initial Columns Into Multiple Target Columns and pandas dataframe reshaping/stacking of multiple value variables into seperate columns. However I need to do this explicitly by column name, rather than by index location.
import pandas as pd
df = pd.DataFrame([('a','b','c',1,2,3,'aa','bb','cc'), ('d', 'e', 'f', 4, 5, 6, 'dd', 'ee', 'ff')],
columns=['a_1', 'a_2', 'a_3','b_1', 'b_2', 'b_3','c_1', 'c_2', 'c_3'])
df
Original Dataframe:
id a_1 a_2 a_3 b_1 b_2 b_3 c_1 c_2 c_3
0 101 a b c 1 2 3 aa bb cc
1 102 d e f 4 5 6 dd ee ff
Target Dataframe
id a b c
0 101 a 1 aa
1 101 b 2 bb
2 101 c 3 cc
3 102 d 4 dd
4 102 e 5 ee
5 102 f 6 ff
Advice is much appreciated on an approach to this.
Solution 1:[1]
There is a more efficient way to do these type of problems that involve melting multiple different sets of columns. pd.wide_to_long
is built for these exact situations.
pd.wide_to_long(df, stubnames=['a', 'b', 'c'], i='id', j='dropme', sep='_')\
.reset_index()\
.drop('dropme', axis=1)\
.sort_values('id')
id a b c
0 101 a 1 aa
2 101 b 2 bb
4 101 c 3 cc
1 102 d 4 dd
3 102 e 5 ee
5 102 f 6 ff
Solution 2:[2]
You can convert the column names to multi index based on the columns pattern and then stack at a particular level depending on the result you need:
import pandas as pd
df.set_index('id', inplace=True)
df.columns = pd.MultiIndex.from_tuples(tuple(df.columns.str.split("_")))
df.stack(level = 1).reset_index(level = 1, drop = True).reset_index()
# id a b c
#101 a 1 aa
#101 b 2 bb
#101 c 3 cc
#102 d 4 dd
#102 e 5 ee
#102 f 6 ff
Solution 3:[3]
cols = df.columns.difference(['id'])
pd.lreshape(df, cols.groupby(cols.str.split('_').str[0])).sort_values('id')
Out:
id a c b
0 101 a aa 1
2 101 b bb 2
4 101 c cc 3
1 102 d dd 4
3 102 e ee 5
5 102 f ff 6
Solution 4:[4]
One option is pivot_longer from pyjanitor, which abstracts the process, and is efficient:
# pip install janitor
import janitor
df.pivot_longer(
index = None,
names_to = '.value',
names_pattern = '([a-z]+)_*')
a b c
0 a 1 aa
1 d 4 dd
2 b 2 bb
3 e 5 ee
4 c 3 cc
5 f 6 ff
The idea for this particular reshape is that whatever group in the regular expression is paired with the .value
stays as the column header.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Ted Petrou |
Solution 2 | Psidom |
Solution 3 | ayhan |
Solution 4 | sammywemmy |