'How to avoid excessive ram consumption using pathos
This is a rough example of how I leverage multiprocessing with pathos:
from pathos.multiprocessing import ProcessingPool
pool = ProcessingPool(10)
results = pool.map(func, args)
Each func's run can take a while. Let's say it's 5 minutes, and len(args) == 20
.
In this case it'll take around 10 minutes to finish.
During this period ram usage steadily grows and the memory is only freed when all work is done.
The main question: how do I change the approach to free the memory each time a process is finished, instead of waiting for all of them to finish? Otherwise, if there are 100 args, the total ram consumption would be 5 times bigger than when there are 20 args, even though they are all being computed in parallel chunks of 10.
Besides, the reason behind memory growth is unclear. I allocate memory at the start of func
, but the usage grows with time. The return value of func is always 0, I store the results on disk.
Also, is there a way to have a few arrays to reside in a shared memory area? So that each process doesn't have to make its own copy.
Solution 1:[1]
I was making plots inside my func
. To make those plots, I used matplotlib.pyplot
's interface. Turns out that matplotlib keeps references to all the figures created via pyplot. This is why ram wasn't released. So the issue has nothing to do with pathos
. Clearing the figures solved it.
Solution 2:[2]
The built-in process pool task supports this via the maxtasksperchild argument. It's possible to use this class via pathos as described in this issue
import pathos
pool = pathos.pools._ProcessPool(processes=10, maxtasksperchild=1)
results = pool.map(func, args)
Setting maxtasksperchild to 1
should mean that each process restarts after every task
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Ilya Chernov |
Solution 2 | Iain Shelvington |