'Python Pandas: Create a dictionary from a dataframe with values equal to a list of all row values

I am trying to create dictionary from a dictionary from a dataframe in the following way. My dataframe contains a column called station_id. The station_id values are unique. That is each row correspond to station id. Then there is another column called trip_id (see example below). Many stations can be associated with a single trip_id. For example

l1=[1,1,2,2]
l2=[34,45,66,67]
df1=pd.DataFrame(list(zip(l1,l2)),columns=['trip_id','station_name'])
df1.head()

trip_id station_name
0   1   34
1   1   45
2   2   66
3   2   67

I am trying to get a dictionary d={1:[34,45],2:[66,67]}. I solved it with a for loop in the following fashion.

from tqdm import tqdm
Trips_Stations={}
Trips=set(df['trip_id'])
T=list(Trips)
for i in tqdm(range(len(Trips))):
    c_id=T[i]
    Values=list(df[df.trip_id==c_id].stop_id)

    Trips_Stations.update({c_id:Values})
Trips_Stations    

My actual dataset has about 65000 rows. The above takes about 2 minutes to run. While this is acceptable for my application, I was wondering if there is a faster way to do it using base pandas.

Thanks



Solution 1:[1]

somehow stackoverflow suggested that I look at Group_By

This is much faster

d=df.groupby('trip_id')['stop_id'].apply(list)
from collections import OrderedDict, defaultdict
o_d=d.to_dict(OrderedDict)
o_d=dict(o_d)

It took about 30 secs for the dataframe with 65000 rows. Then

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 kangkan Dc