'Python Pandas: Create a dictionary from a dataframe with values equal to a list of all row values
I am trying to create dictionary from a dictionary from a dataframe in the following way. My dataframe contains a column called station_id. The station_id values are unique. That is each row correspond to station id. Then there is another column called trip_id (see example below). Many stations can be associated with a single trip_id. For example
l1=[1,1,2,2]
l2=[34,45,66,67]
df1=pd.DataFrame(list(zip(l1,l2)),columns=['trip_id','station_name'])
df1.head()
trip_id station_name
0 1 34
1 1 45
2 2 66
3 2 67
I am trying to get a dictionary d={1:[34,45],2:[66,67]}
.
I solved it with a for loop in the following fashion.
from tqdm import tqdm
Trips_Stations={}
Trips=set(df['trip_id'])
T=list(Trips)
for i in tqdm(range(len(Trips))):
c_id=T[i]
Values=list(df[df.trip_id==c_id].stop_id)
Trips_Stations.update({c_id:Values})
Trips_Stations
My actual dataset has about 65000 rows. The above takes about 2 minutes to run. While this is acceptable for my application, I was wondering if there is a faster way to do it using base pandas.
Thanks
Solution 1:[1]
somehow stackoverflow suggested that I look at Group_By
This is much faster
d=df.groupby('trip_id')['stop_id'].apply(list)
from collections import OrderedDict, defaultdict
o_d=d.to_dict(OrderedDict)
o_d=dict(o_d)
It took about 30 secs for the dataframe with 65000 rows. Then
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | kangkan Dc |