'Pandas Melt several groups of columns into multiple target columns by name

I would like to melt several groups of columns of a dataframe into multiple target columns. Similar to questions Python Pandas Melt Groups of Initial Columns Into Multiple Target Columns and pandas dataframe reshaping/stacking of multiple value variables into seperate columns. However I need to do this explicitly by column name, rather than by index location.

import pandas as pd
df = pd.DataFrame([('a','b','c',1,2,3,'aa','bb','cc'), ('d', 'e', 'f', 4, 5, 6, 'dd', 'ee', 'ff')],
                  columns=['a_1', 'a_2', 'a_3','b_1', 'b_2', 'b_3','c_1', 'c_2', 'c_3'])
df

Original Dataframe:

    id   a_1  a_2  a_3  b_1  b_2  b_3  c_1  c_2  c_3
0   101   a    b    c    1    2    3    aa   bb   cc
1   102   d    e    f    4    5    6    dd   ee   ff

Target Dataframe

     id   a   b   c
0   101   a   1   aa
1   101   b   2   bb
2   101   c   3   cc
3   102   d   4   dd
4   102   e   5   ee
5   102   f   6   ff

Advice is much appreciated on an approach to this.



Solution 1:[1]

There is a more efficient way to do these type of problems that involve melting multiple different sets of columns. pd.wide_to_long is built for these exact situations.

pd.wide_to_long(df, stubnames=['a', 'b', 'c'], i='id', j='dropme', sep='_')\
  .reset_index()\
  .drop('dropme', axis=1)\
  .sort_values('id')

    id  a  b   c
0  101  a  1  aa
2  101  b  2  bb
4  101  c  3  cc
1  102  d  4  dd
3  102  e  5  ee
5  102  f  6  ff

Solution 2:[2]

You can convert the column names to multi index based on the columns pattern and then stack at a particular level depending on the result you need:

import pandas as pd
df.set_index('id', inplace=True)
df.columns = pd.MultiIndex.from_tuples(tuple(df.columns.str.split("_")))
df.stack(level = 1).reset_index(level = 1, drop = True).reset_index()

# id    a   b    c      
#101    a   1   aa
#101    b   2   bb
#101    c   3   cc
#102    d   4   dd
#102    e   5   ee
#102    f   6   ff

Solution 3:[3]

cols = df.columns.difference(['id'])

pd.lreshape(df, cols.groupby(cols.str.split('_').str[0])).sort_values('id')
Out: 
    id  a   c  b
0  101  a  aa  1
2  101  b  bb  2
4  101  c  cc  3
1  102  d  dd  4
3  102  e  ee  5
5  102  f  ff  6

Solution 4:[4]

One option is pivot_longer from pyjanitor, which abstracts the process, and is efficient:

# pip install janitor
import janitor

df.pivot_longer(
    index = None, 
    names_to = '.value', 
    names_pattern = '([a-z]+)_*')

   a  b   c
0  a  1  aa
1  d  4  dd
2  b  2  bb
3  e  5  ee
4  c  3  cc
5  f  6  ff

The idea for this particular reshape is that whatever group in the regular expression is paired with the .value stays as the column header.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ted Petrou
Solution 2 Psidom
Solution 3 ayhan
Solution 4 sammywemmy