'pandas: most elegant way to pivot table on pattern in name of columns

Given the following DataFrame:

pd.DataFrame({
  'x': [0, 1],
  'y': [0, 1],
  'a_idx': [0, 1],
  'a_val': [2, 3],
  'b_idx': [4, 5],
  'b_val': [6, 7],
})

What is the cleanest way to pivot the DataFrame based on the prefix of the idx and val columns if you have an indeterminate amount of unique prefixes (a, b, ... n), so as to obtain the following DataFrame?

pd.DataFrame({
  'x': [0, 1, 0, 1],
  'y': [0, 1, 0, 1],
  'key': ['a','a','b','b'],
  'idx': [0, 1, 4, 5],
  'val': [2, 3, 6, 7]
})

I am not very knowledgeable in pandas, so my easiest solution was to go earlier in the data generation process and generate a subset of the result DataFrame for each prefix in SQL, and then concat the result sets into a final DataFrame. I'm curious however if there is a simple way to do this using the API of pandas.DataFrame. Is there such a thing?



Solution 1:[1]

Let's try wide_to_long with extras:

(pd.wide_to_long(df,stubnames=['a','b'],
                i=['x','y'],
                j='key',
                sep='_',
                suffix='\\w+'
               )
   .unstack('key').stack(level=0).reset_index()
)

Or manually with melt:

out = df.melt(['x', 'y'])
out = (out.join(out['variable'].str.split('_', expand=True))
       .rename(columns={0: 'key'})
       .pivot_table(index=['x', 'y', 'key'], columns=[1], values='value')
       .reset_index()
       )

Output:

key  x  y level_2  idx  val
0    0  0       a    0    2
1    0  0       b    4    6
2    1  1       a    1    3
3    1  1       b    5    7

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1