'scipy.fftpack.fft with multiprocessing, how to avoid performance losses?

I would like to use scipy.fftpack.fft (and rfft) inside a multiprocessing structure,
I have observed significant performances losses due to an apparent incompatibility between scipy.fftpack and multiprocessing, which makes the parallelization almost inefficient.
Unless the issue seems well known, I could not find a solution in the web to avoid this performance losses.

Below is an minimalist example showing the issue :

import time
import multiprocessing as mp
from scipy.fftpack import fft, ifft
import numpy as np


def costly_function(n_mean: int):
    start = time.time()
    x = np.ones(16385, dtype=float)
    for n in range(n_mean):
        fft(ifft(x))
    return (time.time() - start) * 1000.


n_run = 24
# ===== sequential test
sequential_times = [costly_function(500) for _ in range(n_run)]

print(f"time per run (sequential): {np.mean(sequential_times):.2f}+-{np.std(sequential_times):.2f}ms")

# ===== parallel test
with mp.Pool(12) as pool:
    parallel_times = pool.map(costly_function, [500 for _ in range(n_run)])

print(f"time per run (parallel): {np.mean(parallel_times):.2f}+-{np.std(parallel_times):.2f}ms")

On a 12 cores machine under Ubuntu and python 3.10, I get the following result :

>> time per run (sequential): 510.55+-64.64ms
>> time per run (parallel): 1254.26+-114.47ms

note : none of these additions could resolve the problem

import os
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'
os.environ['OMP_NUM_THREADS'] = '1'
os.environ['NUMEXPR_NUM_THREADS'] = '1'


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source