'nearly 0% GPU-Util but high GPU Memory
A newbie for machine learning here. I'm now training a fairly easy model from tutorial using the dataset fashion_mnist on Win10. However, the training process took extremely long and I didn't even finish it. But I used the same code on my friend's Linux system it took less than 1 min.
I tried to examine the problem but the setup and environment of my computer seemed fine.
import tensorflow as tf
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
print(tf.test.is_built_with_cuda())
With the outcome:
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 13701120911614314629
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 3061212774
locality {
bus_id: 1
links {
}
}
incarnation: 7589776483736281928
physical_device_desc: "device: 0, name: GeForce GTX 1650, pci bus id: 0000:01:00.0, compute capability: 7.5"
]
True
But the problem is almost 0% GPU-Util but high GPU Memory usage.
C:\Users\Herr LU>nvidia-smi
Mon Apr 06 16:36:53 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 442.19 Driver Version: 442.19 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1650 WDDM | 00000000:01:00.0 Off | N/A |
| N/A 64C P0 18W / N/A | 3256MiB / 4096MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 22728 C ...al\Programs\Python\Python37\pythonw.exe N/A |
+-----------------------------------------------------------------------------+
C:\Users\Herr LU>
Here is the code:
#shoes recognition
import tensorflow as tf
from tensorflow import keras
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
#import dataset of clothes, return a path
mnist = keras.datasets.fashion_mnist
#seperate training data and testing data, which is already accomplished
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
import matplotlib.pyplot as plt
#show the array in pictures,cmap=colormap
#plt.imshow(training_images[0])
#print(training_labels[0])
#print(training_images[0])
with tf.device('/device:gpu:0'):
#normalizing the color value to 0~1
training_images = training_images/255.0
test_images = test_images/255.0
#Build a model
model=keras.Sequential([keras.layers.Flatten(),
keras.layers.Dense(128,activation=tf.nn.relu),
keras.layers.Dense(10,activation=tf.nn.softmax)])
#Compile the model with an optimzer and a loss function
model.compile(optimizer = keras.optimizers.Adam(),
loss = 'sparse_categorical_crossentropy',
metrics = ['accuracy'])
#train the model with data
model.fit(training_images, training_labels, epochs=5)
#evaluate the model
model.evaluate(test_images, test_labels)
What should I do to solve this problem?
Solution 1:[1]
You have to track CUDA progress if you really want to track GPU usage, to track CUDA progress open the task manager click on performance, and select GPU, in the GPU section change anyone of the first four progress to "CUDA" and you will see if the cuda cores are in the usage or not.
you can select the cuda from the dropdown menu of any one of the first four progress bars in the gpu section.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Gavriel Cohen |