'pandas: most elegant way to pivot table on pattern in name of columns
Given the following DataFrame:
pd.DataFrame({
'x': [0, 1],
'y': [0, 1],
'a_idx': [0, 1],
'a_val': [2, 3],
'b_idx': [4, 5],
'b_val': [6, 7],
})
What is the cleanest way to pivot the DataFrame based on the prefix of the idx and val columns if you have an indeterminate amount of unique prefixes (a, b, ... n), so as to obtain the following DataFrame?
pd.DataFrame({
'x': [0, 1, 0, 1],
'y': [0, 1, 0, 1],
'key': ['a','a','b','b'],
'idx': [0, 1, 4, 5],
'val': [2, 3, 6, 7]
})
I am not very knowledgeable in pandas, so my easiest solution was to go earlier in the data generation process and generate a subset of the result DataFrame for each prefix in SQL, and then concat the result sets into a final DataFrame. I'm curious however if there is a simple way to do this using the API of pandas.DataFrame. Is there such a thing?
Solution 1:[1]
Let's try wide_to_long
with extras:
(pd.wide_to_long(df,stubnames=['a','b'],
i=['x','y'],
j='key',
sep='_',
suffix='\\w+'
)
.unstack('key').stack(level=0).reset_index()
)
Or manually with melt
:
out = df.melt(['x', 'y'])
out = (out.join(out['variable'].str.split('_', expand=True))
.rename(columns={0: 'key'})
.pivot_table(index=['x', 'y', 'key'], columns=[1], values='value')
.reset_index()
)
Output:
key x y level_2 idx val
0 0 0 a 0 2
1 0 0 b 4 6
2 1 1 a 1 3
3 1 1 b 5 7
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |