'Slow behaviour in pandas.groupby.apply in pandas 1.41

I recently have some slow issue using df.grouby.apply in pandas 1.4.1 but not in pandas 1.2.0.

My code is :

def is_detect(lc,thr=5., n_det=4):
    SNR_det = lc['flux'] / lc['fluxerr'] > thr
    if SNR_det.sum() > n_det :
        return True
    return False

isdetect_mask = sim.sim_lcs.groupby('ID').apply(is_detect)

Where the sim.sim_lcs is a pandas DataFrame with multi-index. I test using %timeit on pandas 1.2.0 and 1.4.1 :

%timeit isdetect_mask = %timeit isdetect_mask = sim.sim_lcs.loc[:100].groupby('ID').apply(is_detect)

Gives 213 ms ± 37.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) for pandas 1.2.0 and 27 s ± 5.48 s per loop (mean ± std. dev. of 7 runs, 1 loop each) for pandas 1.4.1

I have built a simplier exemple to allow to reproduce this behaviour :

print(pd.__version__)

n = 30000

# Define a pandas dataframe
dic = {'A': [],
       'B' : [],
       'F': [],
       'eF': []}

for i in range(n):
    nsub_idx = np.random.randint(50, 100)
    dic['A'] += [i] * nsub_idx
    dic['B'] += [j for j in range(nsub_idx)]
    F = np.random.uniform(10, 100, size=nsub_idx)
    dic['F'] += list(F)
    dic['eF'] += list(np.random.normal(0, np.sqrt(F)))
    
df = pd.DataFrame(dic)
df.set_index(['A', 'B'], inplace=True)

# Function to apply
def _test_(elmt, thr=5., n_det=4):
    test = elmt['F'] / elmt['eF'] > thr
    if test.sum() > n_det :
        return True
    return False

%timeit mask = df.groupby('A').apply(_test_)

but when I test it, pandas 1.4.1 perform better than 1.2.0, so I guess the problem come from my side but I can't figure what is it?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source