'Multiprocessing, file not found
I'm using AlphaPose from GitHub and I'd like to run the script script/demo_inference.py from another script I created in AlphaPose root called run.py. In run.py I imported demo_inference.py as ap using this script:
def import_module_by_path(path):
name = os.path.splitext(os.path.basename(path))[0] spec =
importlib.util.spec_from_file_location(name, path) mod =
importlib.util.module_from_spec(spec) spec.loader.exec_module(mod) return mod
and
ap = import_module_by_path('./scripts/demo_inference.py')
Then, in demo_inference.py I substituted
if __name__ == "__main__":
with
def startAlphapose():
and in run.py I wrote
ap.StartAlphapose().
Now I got this error:
Load SE Resnet...
Loading YOLO model..
Process Process-3:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/vislab/guerri/alphagastnet/insieme/alphapose/utils/detector.py", line 251, in image_postprocess
(orig_img, im_name, boxes, scores, ids, inps, cropped_boxes) = self.wait_and_get(self.det_queue)
File "/home/vislab/guerri/alphagastnet/insieme/alphapose/utils/detector.py", line 121, in wait_and_get
return queue.get()
File "/usr/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/home/vislab/guerri/alphagastnet/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 284, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 487, in Client
c = SocketClient(address)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient
s.connect(address)
FileNotFoundError: [Errno 2] No such file or directory
What does it mean?
Solution 1:[1]
We were running into this same problem in our cluster.
When using multiprocessing in PyTorch (typically to run multiple DataLoader workers), the subprocesses create sockets in the /tmp
directory to communitcate with each other. These sockets all saved in folders named pymp-######
and look like 0-byte files. Deleting these files or folders while your PyTorch scripts are still running will cause the above error.
In our case, the problem was a buggy maintenance script that was erasing files out of the /tmp
folder while they were still needed. It's possible there are other ways to trigger this error. But you should start by looking for those sockets and making sure they aren't getting erased by accident.
If that doesn't solve it, take a look at your /var/log/syslog
file at the exact time when the error occurred. You'll very likely find the cause of it there.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | markcoatsworth |