'Cuda illegal memory access when running inference on *.engine

After exporting a YoloV5 model to .engine I receive an error when trying to perform inference on it.

Loading model.engine for TensorRT inference...
[01/16/2022-04:18:26] [TRT] [I] [MemUsageChange] Init CUDA: CPU +426, GPU +0, now: CPU 520, GPU 3258 (MiB)
[01/16/2022-04:18:26] [TRT] [I] Loaded engine size: 28 MiB
[01/16/2022-04:18:26] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0
[01/16/2022-04:18:26] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +634, GPU +266, now: CPU 1193, GPU 3552 (MiB)
[01/16/2022-04:18:27] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +126, GPU +58, now: CPU 1319, GPU 3610 (MiB)
[01/16/2022-04:18:27] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +26, now: CPU 0, GPU 26 (MiB)
[01/16/2022-04:18:31] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0
[01/16/2022-04:18:31] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +10, now: CPU 5022, GPU 5368 (MiB)
[01/16/2022-04:18:31] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 5022, GPU 5376 (MiB)
[01/16/2022-04:18:31] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +36, now: CPU 0, GPU 62 (MiB)
Adding AutoShape...
[01/16/2022-04:18:32] [TRT] [E] 1: [convolutionRunner.cpp::executeConv::511] Error Code 1: Cudnn (CUDNN_STATUS_EXECUTION_FAILED)
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/root/.cache/torch/hub/ultralytics_yolov5_master/models/common.py", line 539, in forward
t.append(time_sync())
File "/root/.cache/torch/hub/ultralytics_yolov5_master/utils/torch_utils.py", line 91, in time_sync
torch.cuda.synchronize()
File "/usr/local/lib/python3.8/dist-packages/torch/cuda/init.py", line 493, in synchronize
return torch._C._cuda_synchronize()
RuntimeError: CUDA error: an illegal memory access was encountered
[01/16/2022-04:18:32] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::35] Error Code 1: Cuda Runtime (an illegal memory access was encountered)

Code:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

torch.backends.cudnn.benchmark = True
torch.backends.cudnn.enabled= False
model = torch.hub.load('ultralytics/yolov5', 'custom', 'model.engine', force_reload=True)
model.to(device)
model.half()

image = open("find.png", 'rb').read()
original_image = cv2.imdecode(np.frombuffer(image, np.uint8), cv2.IMREAD_COLOR)
resized_image = cv2.resize(original_image, (320, 320))
with torch.inference_mode():
    results = model(resized_image, size=320)

GPU: Nvidia A100

Cuda compilation tools, release 11.1, V11.1.105 Build cuda_11.1.TC455_06.29190527_0

Torch 1.9.1+cu111

Name: nvidia-tensorrt Version: 8.2.1.8

Solution 1:^[1]

Yes this is a known YOLOv5 TRT issue with AutoShape. The following code works correctly:

python export.py --weights yolov5s.pt --include engine
python detect.py --weights yolov5s.engine

But if we use the same model for AutoShape inference we get the above CUDA error you mentioned. I have no idea why, I've looked into it several times and can't find the cause. If you have any ideas or discover a solution please let us know!

Linked to https://github.com/ultralytics/yolov5/issues/7128#issuecomment-1107465204

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Glenn Jocher

'Cuda illegal memory access when running inference on *.engine

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]