'nearly 0% GPU-Util but high GPU Memory

A newbie for machine learning here. I'm now training a fairly easy model from tutorial using the dataset fashion_mnist on Win10. However, the training process took extremely long and I didn't even finish it. But I used the same code on my friend's Linux system it took less than 1 min.

I tried to examine the problem but the setup and environment of my computer seemed fine.

import tensorflow as tf 
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
print(tf.test.is_built_with_cuda())

With the outcome:

device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 13701120911614314629
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 3061212774
locality {
  bus_id: 1
  links {
  }
}
incarnation: 7589776483736281928
physical_device_desc: "device: 0, name: GeForce GTX 1650, pci bus id: 0000:01:00.0, compute capability: 7.5"
]
True

But the problem is almost 0% GPU-Util but high GPU Memory usage.


C:\Users\Herr LU>nvidia-smi
Mon Apr 06 16:36:53 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 442.19       Driver Version: 442.19       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1650   WDDM  | 00000000:01:00.0 Off |                  N/A |
| N/A   64C    P0    18W /  N/A |   3256MiB /  4096MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     22728      C   ...al\Programs\Python\Python37\pythonw.exe N/A      |
+-----------------------------------------------------------------------------+

C:\Users\Herr LU>

Here is the code:

#shoes recognition
import tensorflow as tf
from tensorflow import keras

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

#import dataset of clothes, return a path
mnist = keras.datasets.fashion_mnist

#seperate training data and testing data, which is already accomplished
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()

import matplotlib.pyplot as plt

#show the array in pictures,cmap=colormap
#plt.imshow(training_images[0])
#print(training_labels[0])
#print(training_images[0])

with tf.device('/device:gpu:0'):
    #normalizing the color value to 0~1
    training_images = training_images/255.0
    test_images = test_images/255.0

    #Build a model
    model=keras.Sequential([keras.layers.Flatten(),
                            keras.layers.Dense(128,activation=tf.nn.relu),
                            keras.layers.Dense(10,activation=tf.nn.softmax)])

    #Compile the model with an optimzer and a loss function
    model.compile(optimizer = keras.optimizers.Adam(),
                  loss = 'sparse_categorical_crossentropy',
                  metrics = ['accuracy'])

    #train the model with data
    model.fit(training_images, training_labels, epochs=5)

    #evaluate the model
    model.evaluate(test_images, test_labels)

What should I do to solve this problem?



Solution 1:[1]

You have to track CUDA progress if you really want to track GPU usage, to track CUDA progress open the task manager click on performance, and select GPU, in the GPU section change anyone of the first four progress to "CUDA" and you will see if the cuda cores are in the usage or not.

you can select the cuda from the dropdown menu of any one of the first four progress bars in the gpu section.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Gavriel Cohen