'Rearrange pandas Dataframes
I decide to simplify my post and replace images with code which has the same structure (and problem) inside and everyone could 'copy-paste' this example to try solve it.
import pandas as pd
df = pd.DataFrame(index=range(15),
data={'params': ['TerminalName', 'Installed', 'Locked', 'InstallDate', 'NbBikes',
'TerminalName', 'Installed', 'Locked', 'InstallDate', 'NbBikes',
'TerminalName', 'Installed', 'Locked', 'InstallDate', 'NbBikes',],
'vals': ['001023', True, False, 1278947280000, 15,
'001023', True, False, 1278947280000, 15,
'001023', True, False, 1278947280000, 15],
'id_bikepoint': ['BikePoints_1', 'BikePoints_1', 'BikePoints_1', 'BikePoints_1', 'BikePoints_1',
'BikePoints_3', 'BikePoints_3', 'BikePoints_3', 'BikePoints_3', 'BikePoints_3',
'BikePoints_6', 'BikePoints_6', 'BikePoints_6', 'BikePoints_6', 'BikePoints_6'],
'address': ['addr1', 'addr1', 'addr1', 'addr1', 'addr1',
'addr2', 'addr2', 'addr2', 'addr2', 'addr2',
'addr3', 'addr3', 'addr3', 'addr3', 'addr3'],
'lat': [51.529163, 51.529163, 51.529163, 51.529163, 51.529163,
51.536392, 51.536392, 51.536392, 51.536392, 51.536392,
51.213342, 51.213342, 51.213342, 51.213342, 51.213342],
'lon': [-0.10997, -0.10997, -0.10997, -0.10997, -0.10997,
-0.112721, -0.112721, -0.112721, -0.112721, -0.112721,
-0.115231, -0.115231, -0.115231, -0.115231, -0.115231]
})
params vals id_bikepoint address lat lon
0 TerminalName 001023 BikePoints_1 addr1 51.529163 -0.109970
1 Installed True BikePoints_1 addr1 51.529163 -0.109970
2 Locked False BikePoints_1 addr1 51.529163 -0.109970
3 InstallDate 1278947280000 BikePoints_1 addr1 51.529163 -0.109970
4 NbBikes 15 BikePoints_1 addr1 51.529163 -0.109970
5 TerminalName 001021 BikePoints_3 addr2 51.536392 -0.112721
6 Installed True BikePoints_3 addr2 51.536392 -0.112721
7 Locked False BikePoints_3 addr2 51.536392 -0.112721
8 InstallDate 1278947270000 BikePoints_3 addr2 51.536392 -0.112721
9 NbBikes 18 BikePoints_3 addr2 51.536392 -0.112721
10 TerminalName 001016 BikePoints_6 addr3 51.213342 -0.115231
11 Installed True BikePoints_6 addr3 51.213342 -0.115231
12 Locked False BikePoints_6 addr3 51.213342 -0.115231
13 InstallDate 1278947260000 BikePoints_6 addr3 51.213342 -0.115231
14 NbBikes 7 BikePoints_6 addr3 51.213342 -0.115231
Description of situation:
First 2 columns need to be replaced by 5 columns with the names 'TerminalName', 'Installed', 'Locked', 'InstallDate', 'NbBikes' and values taken from 'vals' column (pivot/transforming are not working for me). Every column will be with its own data type (int, string, float), which is preferred for numpy.
Columns 'id_bikepoint','address','lat','lon' have the same data repeated many times (useless redundancy of code) and this also need to be simplify.
Desired result:
TerminalName Installed Locked InstallDate NbBikes id_bikepoint \
0 001023 True False 1278947280000 15 BikePoints_1
1 001021 True False 1278947270000 18 BikePoints_3
2 001016 True False 1278947260000 7 BikePoints_6
address lat lon
0 addr1 51.529163 -0.109970
1 addr2 51.536392 -0.112721
2 addr3 51.213342 -0.115231
Solution 1:[1]
I think calling pd.pivot_table() on the first couple columns should get you off to a good start! The below is untested though.
to_pivot = df[['bikepoint_params','bikepoint_values','bikepoint_id']]
mydf = pd.pivot_table(to_pivot,
columns='bikepoint_params',
values='bikepoint_values',
index='bikepoint_id')
mydf2 = df.drop(['bikepoint_params','bikepoint_values'],axis=1).groupby('bikepoint_id').first()
final = mydf.join(mydf2)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Mose Wintner |