'Multiprocessing, file not found

I'm using AlphaPose from GitHub and I'd like to run the script script/demo_inference.py from another script I created in AlphaPose root called run.py. In run.py I imported demo_inference.py as ap using this script:

def import_module_by_path(path): 
  name = os.path.splitext(os.path.basename(path))[0] spec = 
  importlib.util.spec_from_file_location(name, path) mod = 
  importlib.util.module_from_spec(spec) spec.loader.exec_module(mod) return mod

and

ap = import_module_by_path('./scripts/demo_inference.py')

Then, in demo_inference.py I substituted

if __name__ == "__main__": 

with

def startAlphapose(): 

and in run.py I wrote

ap.StartAlphapose().

Now I got this error:

Load SE Resnet...
Loading YOLO model..
Process Process-3:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/vislab/guerri/alphagastnet/insieme/alphapose/utils/detector.py", line 251, in image_postprocess
    (orig_img, im_name, boxes, scores, ids, inps, cropped_boxes) = self.wait_and_get(self.det_queue)
  File "/home/vislab/guerri/alphagastnet/insieme/alphapose/utils/detector.py", line 121, in wait_and_get
    return queue.get()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 113, in get
    return _ForkingPickler.loads(res)
  File "/home/vislab/guerri/alphagastnet/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 284, in rebuild_storage_fd
    fd = df.detach()
  File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
    with _resource_sharer.get_connection(self._id) as conn:
  File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
    c = Client(address, authkey=process.current_process().authkey)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 487, in Client
    c = SocketClient(address)
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient
    s.connect(address)
FileNotFoundError: [Errno 2] No such file or directory

What does it mean?



Solution 1:[1]

We were running into this same problem in our cluster.

When using multiprocessing in PyTorch (typically to run multiple DataLoader workers), the subprocesses create sockets in the /tmp directory to communitcate with each other. These sockets all saved in folders named pymp-###### and look like 0-byte files. Deleting these files or folders while your PyTorch scripts are still running will cause the above error.

In our case, the problem was a buggy maintenance script that was erasing files out of the /tmp folder while they were still needed. It's possible there are other ways to trigger this error. But you should start by looking for those sockets and making sure they aren't getting erased by accident.

If that doesn't solve it, take a look at your /var/log/syslog file at the exact time when the error occurred. You'll very likely find the cause of it there.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 markcoatsworth