'Pytorch with CUDA local installation fails
I am trying to install PyTorch with CUDA. I followed the instructions (installation using conda) mentioned in https://pytorch.org/get-started/locally/
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
The conda install command runs without giving any error:
conda list displays the following:
# Name Version Build Channel
cudatoolkit 11.3.1 h2bc3f7f_2
pytorch 1.11.0 py3.9_cuda11.3_cudnn8.2.0_0 pytorch
pytorch-mutex 1.0 cuda pytorch
torch 1.10.2 pypi_0 pypi
torchaudio 0.11.0 py39_cu113 pytorch
torchvision 0.11.3 pypi_0 pypi
But when I check whether GPU driver and CUDA is enabled and accessible by PyTorch
torch.cuda.is_available()
returns false.
Prior to Pytorch installation, I checked and confirmed the pre-requisites mentioned in
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#pre-installation-actions
Here are my ubuntu server details:
Environment:
- OS/kernel:
Ubuntu 18.04.6 LTS (GNU/Linux 4.15.0-154-generic x86_64)
Footnote under the table: Table 1. Native Linux Distribution Support in CUDA 11.6 mentions
For Ubuntu LTS on x86-64, the Server LTS kernel (e.g. 4.15.x for 18.04) is supported in CUDA 11.6.
- GCC
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
- GLIBC
ldd (Ubuntu GLIBC 2.27-3ubuntu1.5) 2.27
GPU
GeForce GTX 1080 Ti
Kernel headers and development packages
$ uname -r
4.15.0-176-generic
As per my understanding, conda pytorch installation with CUDA will install the CUDA driver too.
I am not sure where did I went wrong. Thanks in advance.
EDIT:
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85
nvcc
shows CUDA version 9.1
whereas
$ nvidia-smi
Wed May 11 06:44:31 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:05:00.0 Off | N/A |
| 25% 40C P8 11W / 250W | 18MiB / 11177MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:06:00.0 Off | N/A |
| 25% 40C P8 11W / 250W | 2MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 108... Off | 00000000:09:00.0 Off | N/A |
| 25% 35C P8 11W / 250W | 2MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 4119 G /usr/lib/xorg/Xorg 9MiB |
| 0 4238 G /usr/bin/gnome-shell 6MiB |
+-----------------------------------------------------------------------------+
nvidia-smi
shows CUDA version 10.0
https://varhowto.com/check-cuda-version/
This article mentions that nvcc refers to CUDA-toolkit whereas nvidia-smi
refers to NVIDIA driver.
Q1: Does it shows that there are two different CUDA installation at the system wide level?
Nvidia Cudatoolkit vs Conda Cudatoolkit
The CUDA toolkit (version 11.3.1) I am installing in my conda environment is different from the one installed as system wide level (which is shown by the output of nvcc
and nvidia-smi
).
Q2: As per the above stackoverflow thread answer, they can be separate. Or is it the reason for my failure to install cuda locally?
Solution 1:[1]
I have solved the issue.
Disclaimer: I am a newbie in CUDA. Following answer is based on a) what I have read in other threads b) my experience based on those discussions.
Core Logic: CUDA driver's version >= CUDA runtime version
Reference: Different CUDA versions shown by nvcc and NVIDIA-smi
In most cases, if nvidia-smi reports a CUDA version that is numerically equal to or higher than the one reported by nvcc -V, this is not a cause for concern. That is a defined compatibility path in CUDA (newer drivers/driver API support "older" CUDA toolkits/runtime API).
As I am using conda's cudatoolkit:
- Driver API: nvidia-smi
- Runtime API: conda's cudatoolkit
For cudatoolkit 11.3.1, I was using nvidia-smi CUDA Version: 10.0
Solution: Upgrade NVIDIA drivers.
Upgraded the NVIDIA drivers following the instruction at https://linuxconfig.org/how-to-install-the-nvidia-drivers-on-ubuntu-18-04-bionic-beaver-linux
Post upgradation, here's the output of nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:05:00.0 Off | N/A |
| 27% 46C P8 12W / 250W | 19MiB / 11177MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:06:00.0 Off | N/A |
| 25% 44C P8 11W / 250W | 2MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce ... Off | 00000000:09:00.0 Off | N/A |
| 25% 39C P8 11W / 250W | 2MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3636 G /usr/lib/xorg/Xorg 9MiB |
| 0 N/A N/A 4263 G /usr/bin/gnome-shell 6MiB |
+-----------------------------------------------------------------------------+
Now driver version(11.4) >= runtime version (11.3.1)
PyTorch is now able to use CUDA with GPU:
In [1]: import torch
In [2]: torch.cuda.is_available()
Out[2]: True
Solution 2:[2]
Is the Nvidia driver correctly installed ? Type nvida-smi
to validate that, this issue may be caused by the mismatch between the driver version and cudatoolkit version.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Kaushik Acharya |
Solution 2 | Florin |