'Pytorch with CUDA local installation fails

I am trying to install PyTorch with CUDA. I followed the instructions (installation using conda) mentioned in https://pytorch.org/get-started/locally/

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

The conda install command runs without giving any error:

conda list displays the following:

# Name                    Version                   Build  Channel

cudatoolkit               11.3.1               h2bc3f7f_2
pytorch                   1.11.0          py3.9_cuda11.3_cudnn8.2.0_0    pytorch
pytorch-mutex             1.0                        cuda    pytorch
torch                     1.10.2                   pypi_0    pypi
torchaudio                0.11.0               py39_cu113    pytorch
torchvision               0.11.3                   pypi_0    pypi

But when I check whether GPU driver and CUDA is enabled and accessible by PyTorch

torch.cuda.is_available()

returns false.

Prior to Pytorch installation, I checked and confirmed the pre-requisites mentioned in

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#pre-installation-actions

Here are my ubuntu server details:

Environment:

  • OS/kernel:

Ubuntu 18.04.6 LTS (GNU/Linux 4.15.0-154-generic x86_64)

Footnote under the table: Table 1. Native Linux Distribution Support in CUDA 11.6 mentions

For Ubuntu LTS on x86-64, the Server LTS kernel (e.g. 4.15.x for 18.04) is supported in CUDA 11.6.

  • GCC

gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

  • GLIBC

ldd (Ubuntu GLIBC 2.27-3ubuntu1.5) 2.27

GPU

GeForce GTX 1080 Ti

Kernel headers and development packages

$ uname -r
4.15.0-176-generic

As per my understanding, conda pytorch installation with CUDA will install the CUDA driver too.

I am not sure where did I went wrong. Thanks in advance.

EDIT:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

nvcc shows CUDA version 9.1

whereas

$ nvidia-smi
Wed May 11 06:44:31 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104      Driver Version: 410.104      CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:05:00.0 Off |                  N/A |
| 25%   40C    P8    11W / 250W |     18MiB / 11177MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:06:00.0 Off |                  N/A |
| 25%   40C    P8    11W / 250W |      2MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 108...  Off  | 00000000:09:00.0 Off |                  N/A |
| 25%   35C    P8    11W / 250W |      2MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      4119      G   /usr/lib/xorg/Xorg                             9MiB |
|    0      4238      G   /usr/bin/gnome-shell                           6MiB |
+-----------------------------------------------------------------------------+

nvidia-smi shows CUDA version 10.0

https://varhowto.com/check-cuda-version/ This article mentions that nvcc refers to CUDA-toolkit whereas nvidia-smi refers to NVIDIA driver.

Q1: Does it shows that there are two different CUDA installation at the system wide level?

Nvidia Cudatoolkit vs Conda Cudatoolkit The CUDA toolkit (version 11.3.1) I am installing in my conda environment is different from the one installed as system wide level (which is shown by the output of nvcc and nvidia-smi).

Q2: As per the above stackoverflow thread answer, they can be separate. Or is it the reason for my failure to install cuda locally?



Solution 1:[1]

I have solved the issue.

Disclaimer: I am a newbie in CUDA. Following answer is based on a) what I have read in other threads b) my experience based on those discussions.

Core Logic: CUDA driver's version >= CUDA runtime version

Reference: Different CUDA versions shown by nvcc and NVIDIA-smi

In most cases, if nvidia-smi reports a CUDA version that is numerically equal to or higher than the one reported by nvcc -V, this is not a cause for concern. That is a defined compatibility path in CUDA (newer drivers/driver API support "older" CUDA toolkits/runtime API).

As I am using conda's cudatoolkit:

  • Driver API: nvidia-smi
  • Runtime API: conda's cudatoolkit

For cudatoolkit 11.3.1, I was using nvidia-smi CUDA Version: 10.0

Solution: Upgrade NVIDIA drivers.

Upgraded the NVIDIA drivers following the instruction at https://linuxconfig.org/how-to-install-the-nvidia-drivers-on-ubuntu-18-04-bionic-beaver-linux

Post upgradation, here's the output of nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:05:00.0 Off |                  N/A |
| 27%   46C    P8    12W / 250W |     19MiB / 11177MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:06:00.0 Off |                  N/A |
| 25%   44C    P8    11W / 250W |      2MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:09:00.0 Off |                  N/A |
| 25%   39C    P8    11W / 250W |      2MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3636      G   /usr/lib/xorg/Xorg                  9MiB |
|    0   N/A  N/A      4263      G   /usr/bin/gnome-shell                6MiB |
+-----------------------------------------------------------------------------+

Now driver version(11.4) >= runtime version (11.3.1)

PyTorch is now able to use CUDA with GPU:

In [1]: import torch

In [2]: torch.cuda.is_available()
Out[2]: True

Solution 2:[2]

Is the Nvidia driver correctly installed ? Type nvida-smi to validate that, this issue may be caused by the mismatch between the driver version and cudatoolkit version.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Kaushik Acharya
Solution 2 Florin