'Detectron2 Speed up inference instance segmentation

I have working instance segmentation, I'm using "mask_rcnn_R_101_FPN_3x" model. When I inference image it takes about 3 second / image on GPU. How can I speed up it faster ?

I code in Google Colab

This is my setup config:

cfg = get_cfg()

cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml"))

cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1 

cfg.OUTPUT_DIR = "/content/drive/MyDrive/TEAM/save/"

cfg.DATASETS.TRAIN = (train_name,)
cfg.DATASETS.TEST = (test_name, )
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")

This is inference:

torch.backends.cudnn.benchmark = True
start = time.time()

predictor = DefaultPredictor(cfg) 

im = cv2.imread("/content/drive/MyDrive/TEAM/mcocr_val_145114ixmyt.jpg")

outputs = predictor(im) 

print(f"Inference time per image is : {(time.time() - start)} s")

Return time:

Inference time per image is : 2.7835421562194824 s

Image I inference size 1024 x 1024 pixel. I have change different size but it still inference 3 second / image. Am I missing anything about Detectron2 ?

More information GPU enter image description here



Solution 1:[1]

These are the two best ways to decrease inference time:

  1. Use a better GPU
  2. Use a shallow network - for example R50 - look at the inference times here: https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md

Decreasing the image size will not decrease the inference time because mask-rcnn has the same number of parameters no matter the size of the image - thus no change in inference time.

Solution 2:[2]

There is a third way. You could use a faster toolkit for the inference e.g. OpenVINO. OpenVINO is optimized specifically for Intel hardware but it should work with any CPU. It optimizes your model by converting to Intermediate Represantation (IR), performing graph pruning and fusing some operations into others while preserving accuracy. Then it uses vectorization in runtime.

If you are able to export Detectron2 to ONNX model you can utilize OpenVINO. You can find a full tutorial on how to convert the ONNX model and performance comparison here. Some snippets below.

Install OpenVINO

The easiest way to do it is using PIP, especially when you use Google Colab.

pip install openvino-dev[onnx]

Use Model Optimizer to convert ONNX model

The Model Optimizer is a command line tool which comes from OpenVINO Development Package. It converts the ONNX model to IR, which is a default format for OpenVINO. You can also try the precision of FP16, which should give you better performance (just change data_type). Run in command line:

mo --input_model "model.onnx" --input_shape "[1,3, 224, 224]" --mean_values="[123.675, 116.28 , 103.53]" --scale_values="[58.395, 57.12 , 57.375]" --data_type FP32 --output_dir "model_ir"

Run the inference

The converted model can be loaded by the runtime and compiled for a specific device e.g. CPU or GPU (integrated into your CPU like Intel HD Graphics).

# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="CPU")

# Get output layer
output_layer_ir = compiled_model_ir.output(0)

# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]

Disclaimer: I work on OpenVINO.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 gap210
Solution 2 dragon7