'How to pivot a dataframe to a wide format?

Suppose I have a pandas DataFrame like this:

import pandas as pd

data = pd.DataFrame({'header': ['age', 'height', 'weight', 'country', 'age', 'height', 'weight', 'bank_id', 'age', 'height', 'weight', 'country'],
                     'values': ['1', '6 ft', '10 kg', 'India', '2', '5 ft', '20 kg', 'A123', '3', '5.5 ft', '30 kg', 'Japan']})

# display(data)
     header  values
0       age       1
1    height    6 ft
2    weight   10 kg
3   country   India
4       age       2
5    height    5 ft
6    weight   20 kg
7   bank_id    A123
8       age       3
9    height  5.5 ft
10   weight   30 kg
11  country   Japan

Now I want to transpose them using Python so that they look something like this:

enter image description here

There are some rows with no data, and they will remain blank.

I was trying with the code:

data.pivot_table(columns="header",values="values",aggfunc="max")

[out]:
header age bank_id country height weight
values   3    A123   Japan   6 ft  30 kg

But it is not giving the proper result. It's only showing a single row.



Solution 1:[1]

  • Pivot doesn't work quite as expected, because the values in data have unique indices.
  • In order for pivot to correctly associate the values, the groups must share an index.
  • The rows can be grouped by 4, in this case, and are ordered, so we can create a new index, and correctly pivot data.
  • This uses an assignment expression, :=, and is only valid from python 3.8.
import pandas as pd

# set up test dataframe
data = pd.DataFrame({'header': ['age', 'height', 'weight', 'country', 'age', 'height', 'weight', 'bank_id', 'age', 'height', 'weight', 'country'],
                     'values': ['1', '6 ft', '10 kg', 'India', '2', '5 ft', '20 kg', 'A123', '3', '5.5 ft', '30 kg', 'Japan']})

# create a unique index; replace 4 with the real group size
# the associated groups in data, must be consecutive
x = 0
data.index = [x := x+1 if i%4 == 0 else x for i, _ in enumerate(data.index)]

# see the indices are matched for each group compared to the OP
    header  values
1      age       1
1   height    6 ft
1   weight   10 kg
1  country   India
2      age       2
2   height    5 ft
2   weight   20 kg
2  bank_id    A123
3      age       3
3   height  5.5 ft
3   weight   30 kg
3  country   Japan

# create a wide dataframe
wide = data.pivot(columns='header', values='values').reset_index(drop=True)

# header is the .name of columns, to make it nothing
wide.columns.name = None
# display(wide)
age bank_id country  height weight
  1     NaN   India    6 ft  10 kg
  2    A123     NaN    5 ft  20 kg
  3     NaN   Japan  5.5 ft  30 kg

Solution 2:[2]

One option would be pivoting columns in order to get distinct columns for the new dataframe, and then eliminate NaN values for each column, and lastly combine them by using pandas.concat function :

import pandas as pd

data = pd.DataFrame({'header': ["age" , "height" ,  "weight" , "bank_id",  "country", "age" , "height" ,  "weight" , "bank_id", "country", "age" , "height" ,  "weight" ,  "country" ],
                     'values': [   "1",    "6 ft",    "10 kg",        "",    "India",   "2" ,   "5 ft" ,   "20 kg" ,    "A123",        "",   "3" , "5.5 ft" ,   "30 kg" ,    "Japan" ]})

pvt_data = data.pivot( columns='header', values='values' )
ls_col = list(pvt_data.columns)

ls_cols = []
for col in ls_col:     
    ls_cols.append(pvt_data[col].dropna().reset_index(drop=True, inplace=False))

print( pd.concat([ls_cols[0],ls_cols[1],ls_cols[2],ls_cols[3],ls_cols[4]],axis=1) )
      age bank_id country  height weight
    0   1           India    6 ft  10 kg
    1   2    A123            5 ft  20 kg
    2   3     NaN   Japan  5.5 ft  30 kg

Solution 3:[3]

age seems to be the starting point of every new data point; we can use that knowledge when reshaping the data, to indicate the start of a new row for pivoting:

(data
.assign(counter = data.header.eq('age').cumsum())
.pivot('counter', 'header', 'values')
.loc[:, ['age', 'height', 'weight', 'country', 'bank_id']]
.rename_axis(index = None, columns = None)
.reset_index(drop=True)
)
  age  height weight country bank_id
0   1    6 ft  10 kg   India     NaN
1   2    5 ft  20 kg     NaN    A123
2   3  5.5 ft  30 kg   Japan     NaN

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 sammywemmy