'CUDA crashes under some certain situation

I am using pytorch 1.8.2 LTS + CUDA 11.1 + CuDNN 8.0.4 under Ubuntu 18.04. It crashes oddly under some certain situation. The problem can be reproduced as following.

import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
a = torch.randn(2, 4).to(device)
b = torch.randn(4, 2).to(device)
c = torch.matmul(a, b)
a = torch.randn(100, 64).to(device)
b = torch.randn(64, 100).to(device)
c = torch.matmul(a, b)
a = torch.randn(1, 100, 64).to(device)
b = torch.randn(1, 64, 100).to(device)
torch.matmul(a, b)

tensor([[[  8.1834,   1.8383,   1.7945,  ...,  12.6254,  12.8753, -11.5631],
         [  3.8268,  -7.1525, -18.0478,  ...,   6.6565, -10.9533,  -9.5993],
         [  5.1988,   9.7145,  -8.7255,  ...,   1.7275,   6.2185, -15.1953],
         ...,
         [ -0.8127,   0.4938,   2.9823,  ...,  -7.6766,   4.4492, -10.5318],
         [ 17.7455,   4.2289,  -1.0179,  ...,   0.5332,   6.9129, -10.6715],
         [ -3.2642,  -1.1587,   0.9091,  ...,  12.3628,   3.0298,   8.1988]]],
       device='cuda:0')

a = torch.randn(2, 100, 64).to(device)
b = torch.randn(2, 64, 100).to(device)
torch.matmul(a, b)

Traceback (most recent call last):
  File "/home/cide/anaconda3/envs/nmmo/lib/python3.9/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)`

a = torch.randn(200, 64).to(device)
b = torch.randn(64, 200).to(device)
torch.matmul(a, b)

tensor([[  5.9775,  13.0003,   0.4235,  ...,   4.0292, -17.0294,  -7.5343],
        [  4.3818,  -6.7144,   3.4724,  ...,   6.9620,   2.7793,  16.6472],
        [  1.8819,  -4.0363,   7.1150,  ...,  -6.0632,  -7.7502,   7.5797],
        ...,
        [  1.6404,   8.4560,   8.4408,  ...,  -3.5046,   7.3649,  -7.1671],
        [  2.1508,  14.0800, -10.0840,  ...,  -5.2585,   5.2174, -11.6113],
        [ 14.9206,  -6.9602,  11.4180,  ...,   4.3933,  -9.2923,  -8.2359]],
       device='cuda:0')

It seems that the error is caused by the shortage of my GPU memory space. However, my GPU usage is less than 20%.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA TITAN Xp     Off  | 00000000:03:00.0  On |                  N/A |
| 26%   45C    P8    15W / 250W |   2034MiB / 12192MiB |      8%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA TITAN Xp     Off  | 00000000:04:00.0 Off |                  N/A |
| 23%   40C    P8    11W / 250W |      4MiB / 12196MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1271      G   /usr/lib/xorg/Xorg                203MiB |
|    0   N/A  N/A      2400      G   /usr/bin/gnome-shell               38MiB |
|    0   N/A  N/A      3652      G   ...AAAAAAAAA= --shared-files       13MiB |
|    0   N/A  N/A     22114      C   ...nda3/envs/nmmo/bin/python      859MiB |
|    0   N/A  N/A     22882      G   ..._22757.log --shared-files       37MiB |
|    0   N/A  N/A     27069      C   ...nda3/envs/nmmo/bin/python      859MiB |
|    0   N/A  N/A     28706      G   ...mviewer/tv_bin/TeamViewer       11MiB |
|    0   N/A  N/A     29739      G   /usr/bin/nvidia-settings            3MiB |
+-----------------------------------------------------------------------------+

Moreover, if the 3rd~8th line in the codes are not run, it won't crash.



Solution 1:[1]

Assuming you have the proper version of CUDA installed, there are some options. You can empty cache:

import gc
gc.collect()
torch.cuda.empty_cache()

This can also be a problem of wandb, if you have it installed, solution:

os.environ["WANDB_SILENT"] = "true"
os.environ["WANDB_DISABLED"] = "true"

Or maybe it's a problem of fragmentation. Solution:

memory=5800/240 (GPU total memory/number of cores)
os.environ["PYTORCH_CUDA_ALLOC_CONF"]="max_split_size_mb:{}".format(memory)

If you plan to work with a neural net, you can also decrease batch size and decrease number of layers.

Shape mismatch can also throw errors, be sure to follow matrix multiplication principles: n,m x m,p = n,p

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1