'How to pivot a dataframe to a wide format?
Suppose I have a pandas DataFrame like this:
import pandas as pd
data = pd.DataFrame({'header': ['age', 'height', 'weight', 'country', 'age', 'height', 'weight', 'bank_id', 'age', 'height', 'weight', 'country'],
'values': ['1', '6 ft', '10 kg', 'India', '2', '5 ft', '20 kg', 'A123', '3', '5.5 ft', '30 kg', 'Japan']})
# display(data)
header values
0 age 1
1 height 6 ft
2 weight 10 kg
3 country India
4 age 2
5 height 5 ft
6 weight 20 kg
7 bank_id A123
8 age 3
9 height 5.5 ft
10 weight 30 kg
11 country Japan
Now I want to transpose them using Python so that they look something like this:
There are some rows with no data, and they will remain blank.
I was trying with the code:
data.pivot_table(columns="header",values="values",aggfunc="max")
[out]:
header age bank_id country height weight
values 3 A123 Japan 6 ft 30 kg
But it is not giving the proper result. It's only showing a single row.
Solution 1:[1]
- Pivot doesn't work quite as expected, because the values in
data
have unique indices. - In order for pivot to correctly associate the values, the groups must share an index.
- The rows can be grouped by 4, in this case, and are ordered, so we can create a new index, and correctly pivot
data
. - This uses an assignment expression,
:=
, and is only valid from python 3.8.
import pandas as pd
# set up test dataframe
data = pd.DataFrame({'header': ['age', 'height', 'weight', 'country', 'age', 'height', 'weight', 'bank_id', 'age', 'height', 'weight', 'country'],
'values': ['1', '6 ft', '10 kg', 'India', '2', '5 ft', '20 kg', 'A123', '3', '5.5 ft', '30 kg', 'Japan']})
# create a unique index; replace 4 with the real group size
# the associated groups in data, must be consecutive
x = 0
data.index = [x := x+1 if i%4 == 0 else x for i, _ in enumerate(data.index)]
# see the indices are matched for each group compared to the OP
header values
1 age 1
1 height 6 ft
1 weight 10 kg
1 country India
2 age 2
2 height 5 ft
2 weight 20 kg
2 bank_id A123
3 age 3
3 height 5.5 ft
3 weight 30 kg
3 country Japan
# create a wide dataframe
wide = data.pivot(columns='header', values='values').reset_index(drop=True)
# header is the .name of columns, to make it nothing
wide.columns.name = None
# display(wide)
age bank_id country height weight
1 NaN India 6 ft 10 kg
2 A123 NaN 5 ft 20 kg
3 NaN Japan 5.5 ft 30 kg
Solution 2:[2]
One option would be pivoting columns in order to get distinct columns for the new dataframe, and then eliminate NaN values for each column, and lastly combine them by using pandas.concat
function :
import pandas as pd
data = pd.DataFrame({'header': ["age" , "height" , "weight" , "bank_id", "country", "age" , "height" , "weight" , "bank_id", "country", "age" , "height" , "weight" , "country" ],
'values': [ "1", "6 ft", "10 kg", "", "India", "2" , "5 ft" , "20 kg" , "A123", "", "3" , "5.5 ft" , "30 kg" , "Japan" ]})
pvt_data = data.pivot( columns='header', values='values' )
ls_col = list(pvt_data.columns)
ls_cols = []
for col in ls_col:
ls_cols.append(pvt_data[col].dropna().reset_index(drop=True, inplace=False))
print( pd.concat([ls_cols[0],ls_cols[1],ls_cols[2],ls_cols[3],ls_cols[4]],axis=1) )
age bank_id country height weight
0 1 India 6 ft 10 kg
1 2 A123 5 ft 20 kg
2 3 NaN Japan 5.5 ft 30 kg
Solution 3:[3]
age
seems to be the starting point of every new data point; we can use that knowledge when reshaping the data, to indicate the start of a new row for pivoting:
(data
.assign(counter = data.header.eq('age').cumsum())
.pivot('counter', 'header', 'values')
.loc[:, ['age', 'height', 'weight', 'country', 'bank_id']]
.rename_axis(index = None, columns = None)
.reset_index(drop=True)
)
age height weight country bank_id
0 1 6 ft 10 kg India NaN
1 2 5 ft 20 kg NaN A123
2 3 5.5 ft 30 kg Japan NaN
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | |
Solution 3 | sammywemmy |