'CUDA crashes under some certain situation
I am using pytorch 1.8.2 LTS + CUDA 11.1 + CuDNN 8.0.4 under Ubuntu 18.04. It crashes oddly under some certain situation. The problem can be reproduced as following.
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
a = torch.randn(2, 4).to(device)
b = torch.randn(4, 2).to(device)
c = torch.matmul(a, b)
a = torch.randn(100, 64).to(device)
b = torch.randn(64, 100).to(device)
c = torch.matmul(a, b)
a = torch.randn(1, 100, 64).to(device)
b = torch.randn(1, 64, 100).to(device)
torch.matmul(a, b)
tensor([[[ 8.1834, 1.8383, 1.7945, ..., 12.6254, 12.8753, -11.5631],
[ 3.8268, -7.1525, -18.0478, ..., 6.6565, -10.9533, -9.5993],
[ 5.1988, 9.7145, -8.7255, ..., 1.7275, 6.2185, -15.1953],
...,
[ -0.8127, 0.4938, 2.9823, ..., -7.6766, 4.4492, -10.5318],
[ 17.7455, 4.2289, -1.0179, ..., 0.5332, 6.9129, -10.6715],
[ -3.2642, -1.1587, 0.9091, ..., 12.3628, 3.0298, 8.1988]]],
device='cuda:0')
a = torch.randn(2, 100, 64).to(device)
b = torch.randn(2, 64, 100).to(device)
torch.matmul(a, b)
Traceback (most recent call last):
File "/home/cide/anaconda3/envs/nmmo/lib/python3.9/code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)`
a = torch.randn(200, 64).to(device)
b = torch.randn(64, 200).to(device)
torch.matmul(a, b)
tensor([[ 5.9775, 13.0003, 0.4235, ..., 4.0292, -17.0294, -7.5343],
[ 4.3818, -6.7144, 3.4724, ..., 6.9620, 2.7793, 16.6472],
[ 1.8819, -4.0363, 7.1150, ..., -6.0632, -7.7502, 7.5797],
...,
[ 1.6404, 8.4560, 8.4408, ..., -3.5046, 7.3649, -7.1671],
[ 2.1508, 14.0800, -10.0840, ..., -5.2585, 5.2174, -11.6113],
[ 14.9206, -6.9602, 11.4180, ..., 4.3933, -9.2923, -8.2359]],
device='cuda:0')
It seems that the error is caused by the shortage of my GPU memory space. However, my GPU usage is less than 20%.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA TITAN Xp Off | 00000000:03:00.0 On | N/A |
| 26% 45C P8 15W / 250W | 2034MiB / 12192MiB | 8% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA TITAN Xp Off | 00000000:04:00.0 Off | N/A |
| 23% 40C P8 11W / 250W | 4MiB / 12196MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1271 G /usr/lib/xorg/Xorg 203MiB |
| 0 N/A N/A 2400 G /usr/bin/gnome-shell 38MiB |
| 0 N/A N/A 3652 G ...AAAAAAAAA= --shared-files 13MiB |
| 0 N/A N/A 22114 C ...nda3/envs/nmmo/bin/python 859MiB |
| 0 N/A N/A 22882 G ..._22757.log --shared-files 37MiB |
| 0 N/A N/A 27069 C ...nda3/envs/nmmo/bin/python 859MiB |
| 0 N/A N/A 28706 G ...mviewer/tv_bin/TeamViewer 11MiB |
| 0 N/A N/A 29739 G /usr/bin/nvidia-settings 3MiB |
+-----------------------------------------------------------------------------+
Moreover, if the 3rd~8th line in the codes are not run, it won't crash.
Solution 1:[1]
Assuming you have the proper version of CUDA installed, there are some options. You can empty cache:
import gc
gc.collect()
torch.cuda.empty_cache()
This can also be a problem of wandb, if you have it installed, solution:
os.environ["WANDB_SILENT"] = "true"
os.environ["WANDB_DISABLED"] = "true"
Or maybe it's a problem of fragmentation. Solution:
memory=5800/240 (GPU total memory/number of cores)
os.environ["PYTORCH_CUDA_ALLOC_CONF"]="max_split_size_mb:{}".format(memory)
If you plan to work with a neural net, you can also decrease batch size and decrease number of layers.
Shape mismatch can also throw errors, be sure to follow matrix multiplication principles: n,m x m,p = n,p
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |