'Rearrange pandas Dataframes

I decide to simplify my post and replace images with code which has the same structure (and problem) inside and everyone could 'copy-paste' this example to try solve it.

import pandas as pd

df = pd.DataFrame(index=range(15), 
                  data={'params': ['TerminalName', 'Installed', 'Locked', 'InstallDate', 'NbBikes',
                                   'TerminalName', 'Installed', 'Locked', 'InstallDate', 'NbBikes',
                                   'TerminalName', 'Installed', 'Locked', 'InstallDate', 'NbBikes',],
                        'vals': ['001023', True, False, 1278947280000, 15,
                                 '001023', True, False, 1278947280000, 15,
                                '001023', True, False, 1278947280000, 15],
                        'id_bikepoint': ['BikePoints_1', 'BikePoints_1', 'BikePoints_1', 'BikePoints_1', 'BikePoints_1',
                                        'BikePoints_3', 'BikePoints_3', 'BikePoints_3', 'BikePoints_3', 'BikePoints_3',
                                        'BikePoints_6', 'BikePoints_6', 'BikePoints_6', 'BikePoints_6', 'BikePoints_6'],
                        'address': ['addr1', 'addr1', 'addr1', 'addr1', 'addr1',
                                    'addr2', 'addr2', 'addr2', 'addr2', 'addr2',
                                    'addr3', 'addr3', 'addr3', 'addr3', 'addr3'],
                        'lat': [51.529163, 51.529163, 51.529163, 51.529163, 51.529163, 
                                51.536392, 51.536392, 51.536392, 51.536392, 51.536392, 
                                51.213342, 51.213342, 51.213342, 51.213342, 51.213342],
                        'lon': [-0.10997, -0.10997, -0.10997, -0.10997, -0.10997, 
                                -0.112721, -0.112721, -0.112721, -0.112721, -0.112721,
                                -0.115231, -0.115231, -0.115231, -0.115231, -0.115231]
                        })


    params        vals           id_bikepoint  address     lat       lon
0   TerminalName         001023  BikePoints_1   addr1  51.529163 -0.109970
1      Installed           True  BikePoints_1   addr1  51.529163 -0.109970
2         Locked          False  BikePoints_1   addr1  51.529163 -0.109970
3    InstallDate  1278947280000  BikePoints_1   addr1  51.529163 -0.109970
4        NbBikes             15  BikePoints_1   addr1  51.529163 -0.109970
5   TerminalName         001021  BikePoints_3   addr2  51.536392 -0.112721
6      Installed           True  BikePoints_3   addr2  51.536392 -0.112721
7         Locked          False  BikePoints_3   addr2  51.536392 -0.112721
8    InstallDate  1278947270000  BikePoints_3   addr2  51.536392 -0.112721
9        NbBikes             18  BikePoints_3   addr2  51.536392 -0.112721
10  TerminalName         001016  BikePoints_6   addr3  51.213342 -0.115231
11     Installed           True  BikePoints_6   addr3  51.213342 -0.115231
12        Locked          False  BikePoints_6   addr3  51.213342 -0.115231
13   InstallDate  1278947260000  BikePoints_6   addr3  51.213342 -0.115231
14       NbBikes              7  BikePoints_6   addr3  51.213342 -0.115231

Description of situation:

First 2 columns need to be replaced by 5 columns with the names 'TerminalName', 'Installed', 'Locked', 'InstallDate', 'NbBikes' and values taken from 'vals' column (pivot/transforming are not working for me). Every column will be with its own data type (int, string, float), which is preferred for numpy.

Columns 'id_bikepoint','address','lat','lon' have the same data repeated many times (useless redundancy of code) and this also need to be simplify.

Desired result:

TerminalName  Installed  Locked    InstallDate  NbBikes  id_bikepoint  \
0       001023       True   False  1278947280000       15  BikePoints_1   
1       001021       True   False  1278947270000       18  BikePoints_3   
2       001016       True   False  1278947260000        7  BikePoints_6   

  address        lat       lon  
0   addr1  51.529163 -0.109970  
1   addr2  51.536392 -0.112721  
2   addr3  51.213342 -0.115231


Solution 1:[1]

I think calling pd.pivot_table() on the first couple columns should get you off to a good start! The below is untested though.

to_pivot = df[['bikepoint_params','bikepoint_values','bikepoint_id']]
mydf = pd.pivot_table(to_pivot, 
                      columns='bikepoint_params', 
                      values='bikepoint_values', 
                      index='bikepoint_id')
mydf2 = df.drop(['bikepoint_params','bikepoint_values'],axis=1).groupby('bikepoint_id').first()

final = mydf.join(mydf2)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Mose Wintner