'Python multiprocessing with TensorRT

I am trying to use a TensorRT engine for inference in a python class that inherits from multiprocessing. The engine works in a standalone python script on my system, but now while integrating it into the codebase, the multiprocessing used in the class seems to be causing problems.

I am not getting any errors. It just skips everything after the line self.runtime = trt.Runtime(self.trt_logger). My debugger from vscode does not go into the function either.

In the docs the following is mentioned, that I do not fully understand:

The TensorRT builder may only be used by one thread at a time. If you need to run multiple builds simultaneously, you will need to create multiple builders. The TensorRT runtime can be used by multiple threads simultaneously, so long as each object uses a different execution context.

The following parts of my code are started, joined and terminated from another file:

# more imports
import logging
import multiprocessing
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit

class MyClass(multiprocessing.Process):
    def __init__(self, messages):
        multiprocessing.Process.__init__(self)
        # other stuff
        self.exit = multiprocessing.Event()

    def load_tensorrt_model(self, config):
        '''Load tensorrt model with engine'''
        logging.debug('Start')

        # Reading the config parameters related to the engine
        engine_file = config['trt_engine']['trt_folder'] + os.path.sep + config['trt_engine']['engine_file']
        class_names_file = config['trt_engine']['trt_folder'] + os.path.sep + config['trt_engine']['class_names_file']

        # Verify if all the necessary files are present, if so load the detection network
        if os.path.exists(engine_file) and os.path.exists(class_names_file):
            try:
                logging.debug('In try statement')
                self.trt_logger = trt.Logger()
                f = open(engine_file, 'rb')
                logging.debug('I can get here, but no further')
                self.runtime = trt.Runtime(self.trt_logger)
                logging.debug('Cannot get here')
                self.engine = self.runtime.deserialize_cuda_engine(f.read())
# More stuff                

I have found someone with a multithreading problem, but as of now I was unable to use this to solve my problem.

Any help is appreciated.

System specs:

  • Python 3.6.9
  • Jetson NX
  • Jetpack 4.4.1
  • L4T 32.4.4
  • Tensorrt 7.1.3.0-1
  • Cuda10.2
  • Ubuntu 18.04


Solution 1:[1]

same problem. It seems pycuda autoinit not working well under a multi process scenario.

try to replace import pycuda.autoinit with

cuda.init()
self.cuda_context = cuda.Device(0).make_context()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 lauthu