'Using multiprocessing pool from celery task raises exception
FOR THOSE READING THIS: I have decided to use RQ instead which doesn't fail when running code that uses the multiprocessing module. I suggest you use that.
I am trying to use a multiprocessing pool from within a celery task using Python 3 and redis as the broker (running it on a Mac). However, I don't seem to be able to even create a multiprocessing Pool object from within the Celery task! Instead, I get a strange exception that I really don't know what to do with.
Can anyone tell me how to accomplish this?
The task:
from celery import Celery
from multiprocessing.pool import Pool
app = Celery('tasks', backend='redis', broker='redis://localhost:6379/0')
@app.task
def test_pool():
with Pool() as pool:
# perform some task using the pool
pool.close()
return 'Done!'
which I add to Celery using:
celery -A tasks worker --loglevel=info
and then running it via the following python script:
import tasks
tasks.test_pool.delay()
that returns the following celery output:
[2015-01-12 15:08:57,571: INFO/MainProcess] Connected to redis://localhost:6379/0
[2015-01-12 15:08:57,583: INFO/MainProcess] mingle: searching for neighbors
[2015-01-12 15:08:58,588: INFO/MainProcess] mingle: all alone
[2015-01-12 15:08:58,598: WARNING/MainProcess] [email protected] ready.
[2015-01-12 15:09:02,425: INFO/MainProcess] Received task: tasks.test_pool[38cab553-3a01-4512-8f94-174743b05369]
[2015-01-12 15:09:02,436: ERROR/MainProcess] Task tasks.test_pool[38cab553-3a01-4512-8f94-174743b05369] raised unexpected: AttributeError("'Worker' object has no attribute '_config'",)
Traceback (most recent call last):
File "/usr/local/lib/python3.4/site-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/local/lib/python3.4/site-packages/celery/app/trace.py", line 438, in __protected_call__
return self.run(*args, **kwargs)
File "/Users/simongray/Code/etilbudsavis/offer-sniffer/tasks.py", line 17, in test_pool
with Pool() as pool:
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/pool.py", line 150, in __init__
self._setup_queues()
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/pool.py", line 243, in _setup_queues
self._inqueue = self._ctx.SimpleQueue()
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/context.py", line 111, in SimpleQueue
return SimpleQueue(ctx=self.get_context())
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/queues.py", line 336, in __init__
self._rlock = ctx.Lock()
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/context.py", line 66, in Lock
return Lock(ctx=self.get_context())
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/synchronize.py", line 163, in __init__
SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/synchronize.py", line 59, in __init__
kind, value, maxvalue, self._make_name(),
File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/synchronize.py", line 117, in _make_name
return '%s-%s' % (process.current_process()._config['semprefix'],
AttributeError: 'Worker' object has no attribute '_config'
Solution 1:[1]
This is a known issue with celery. It stems from an issue introduced in the billiard dependency. A work-around is to manually set the _config
attribute for the current process. Thanks to user @martinth for the work-around below.
from celery.signals import worker_process_init
from multiprocessing import current_process
@worker_process_init.connect
def fix_multiprocessing(**kwargs):
try:
current_process()._config
except AttributeError:
current_process()._config = {'semprefix': '/mp'}
The worker_process_init
hook will execute the code upon worker process initialization. We simply check to see if _config
exists, and set it if it does not.
Solution 2:[2]
A quick solution is to use the thread-based "dummy" multiprocessing
implementation. Change
from multiprocessing import Pool # or whatever you're using
to
from multiprocessing.dummy import Pool
However since this parallelism is thread-based, the usual caveats (GIL) apply.
Solution 3:[3]
Via a useful comment in the Celery issue report linked to in Davy's comment, I was able to solve this by importing the billiard
module's Pool
class instead.
Replace
from multiprocessing import Pool
with
from billiard.pool import Pool
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Razzi Abuissa |
Solution 3 | mochatiger |