'How do to speed up ordinary dataframe loop in python? vectorisation? multiprocess?

I have a simple piece of code. Essentially, I want to speed up my loop that creates a dataframe using dataframes. I haven't found an example and would appreciate anyones help.

df_new = []

for df_i in df:
   df_selected = df[df['good_value'] == df_i_list]
   df_new = pd.concat([df_new,df_selected])

Solution 1:^[1]

Given your code does not work, this is the best I can come up with.

Start with a list of dataframes, then select the rows in your dataframes to another list and then concat in one step.

Since concat is the heavy operation, this makes sure you call it only once, which is how it's meant to be used.

import pandas as pd

dfs = [df1, df2, df3, df4, ...]

sel = [df[df['column_to_filter'] == 'good_value'] for df in dfs]

df_new = pd.concat(sel)  # might be useful to add `ignore_index=True`

Solution 2:^[2]

df_new = df[df['good_value'].isin(df_i_list)]

The pd.concat is 4x slower than .isin()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	ljmc
Solution 2	Moe D

'How do to speed up ordinary dataframe loop in python? vectorisation? multiprocess?

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]