'Dask ProgressBar doesn't work with distributed backend
The progress bar works beautifully when used with the multiprocessing
backend but doesn't seem to work at all when using a distributed
scheduler as the backend.
Is there a way around this? Or another solution? The distributed
package has some progress bars itself but they all require a list of futures to work.
Solution 1:[1]
The key difference is that with multi threading/processing, the results are piped back to the control thread, but with distributed, they are calculated asynchronously on the cluster (even if that's on your local machine). If you previously had code like
with ProgressBar():
out = collection.compute()
Now you can do
from dask.distributed import progress
out = c.compute(collection) # c is the client
progress(out)
and to collect your result: out.result()
or c.gather(out)
Note that the distributed scheduler also makes a graphical dashboard available at http://yourhost:8787 , e.g., see under status/. There you can see your tasks getting executed without having to invoke a progress bar at all.
Solution 2:[2]
There is a solution linked to in this tqdm issue (a popular progress bar package), which will hopefully be merged in at some point: https://github.com/tqdm/tqdm/issues/1230
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | mdurant |
Solution 2 | Scott |