'Slow behaviour in pandas.groupby.apply in pandas 1.41
I recently have some slow issue using df.grouby.apply in pandas 1.4.1 but not in pandas 1.2.0.
My code is :
def is_detect(lc,thr=5., n_det=4):
SNR_det = lc['flux'] / lc['fluxerr'] > thr
if SNR_det.sum() > n_det :
return True
return False
isdetect_mask = sim.sim_lcs.groupby('ID').apply(is_detect)
Where the sim.sim_lcs is a pandas DataFrame with multi-index. I test using %timeit on pandas 1.2.0 and 1.4.1 :
%timeit isdetect_mask = %timeit isdetect_mask = sim.sim_lcs.loc[:100].groupby('ID').apply(is_detect)
Gives 213 ms ± 37.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
for pandas 1.2.0 and 27 s ± 5.48 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
for pandas 1.4.1
I have built a simplier exemple to allow to reproduce this behaviour :
print(pd.__version__)
n = 30000
# Define a pandas dataframe
dic = {'A': [],
'B' : [],
'F': [],
'eF': []}
for i in range(n):
nsub_idx = np.random.randint(50, 100)
dic['A'] += [i] * nsub_idx
dic['B'] += [j for j in range(nsub_idx)]
F = np.random.uniform(10, 100, size=nsub_idx)
dic['F'] += list(F)
dic['eF'] += list(np.random.normal(0, np.sqrt(F)))
df = pd.DataFrame(dic)
df.set_index(['A', 'B'], inplace=True)
# Function to apply
def _test_(elmt, thr=5., n_det=4):
test = elmt['F'] / elmt['eF'] > thr
if test.sum() > n_det :
return True
return False
%timeit mask = df.groupby('A').apply(_test_)
but when I test it, pandas 1.4.1 perform better than 1.2.0, so I guess the problem come from my side but I can't figure what is it?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|