'Training Yolov5 on RTX 3060 Ti GPU I'm getting error "RuntimeError: Unable to find a valid cuDNN algorithm to run convolution"
Training Yolov5 with --img 8088 and batch size 16 on RTX 3060 Ti GPU using the following command
python train.py --img 1088 --batch 16 --epochs 3 --data coco128.yaml --weights yolov5s.pt --device 0 --workers 0
I'm getting the following exception "RuntimeError: Unable to find a valid cuDNN algorithm to run convolution" and by reducing the batch size to 8 I'm able to train the model
File "train.py", line 611, in <module>
main(opt)
File "train.py", line 509, in main
train(opt.hyp, opt, device)
File "train.py", line 311, in train
pred = model(imgs) # forward
File "C:\Program Files\Python38\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\hamza.m\workspace\yolov5\models\yolo.py", line 123, in forward
return self.forward_once(x, profile, visualize) # single-scale inference, train
File "C:\Users\hamza.m\workspace\yolov5\models\yolo.py", line 155, in forward_once
x = m(x) # run
File "C:\Program Files\Python38\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\hamza.m\workspace\yolov5\models\common.py", line 137, in forward
return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
File "C:\Program Files\Python38\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\hamza.m\workspace\yolov5\models\common.py", line 45, in forward
return self.act(self.bn(self.conv(x)))
File "C:\Program Files\Python38\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Program Files\Python38\lib\site-packages\torch\nn\modules\conv.py", line 423, in forward
return self._conv_forward(input, self.weight)
File "C:\Program Files\Python38\lib\site-packages\torch\nn\modules\conv.py", line 419, in _conv_forward
return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution
P.S also can anyone guide me on how to evaluate which GPU is best for training my model please do enlighten me on that as well
Solution 1:[1]
The answer is on the error log
RuntimeError: CUDA out of memory. Tried to allocate 100.00 MiB (GPU 0; 8.00 GiB total capacity; 5.48 GiB already allocated; 81.94 MiB free; 5.61 GiB reserved in total by PyTorch)
It is trying to allocate more memory than you have on your GPU.
Solution 2:[2]
Try to reduce the batch_size, I had the same problem and when I reduce the batch size, it works for me !
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Oscar Rangel |
Solution 2 | El Mehdi Tafik |