'tensorflow:Can save best model only with val_acc available, skipping

I have an issue with tf.callbacks.ModelChekpoint. As you can see in my log file, the warning comes always before the last iteration where the val_acc is calculated. Therefore, Modelcheckpoint never finds the val_acc

Epoch 1/30
1/8 [==>...........................] - ETA: 19s - loss: 1.4174 - accuracy: 0.3000
2/8 [======>.......................] - ETA: 8s - loss: 1.3363 - accuracy: 0.3500 
3/8 [==========>...................] - ETA: 4s - loss: 1.3994 - accuracy: 0.2667
4/8 [==============>...............] - ETA: 3s - loss: 1.3527 - accuracy: 0.3250
6/8 [=====================>........] - ETA: 1s - loss: 1.3042 - accuracy: 0.3333
WARNING:tensorflow:Can save best model only with val_acc available, skipping.
8/8 [==============================] - 4s 482ms/step - loss: 1.2846 - accuracy: 0.3375 - val_loss: 1.3512 - val_accuracy: 0.5000

Epoch 2/30
1/8 [==>...........................] - ETA: 0s - loss: 1.0098 - accuracy: 0.5000
3/8 [==========>...................] - ETA: 0s - loss: 0.8916 - accuracy: 0.5333
5/8 [=================>............] - ETA: 0s - loss: 0.9533 - accuracy: 0.5600
6/8 [=====================>........] - ETA: 0s - loss: 0.9523 - accuracy: 0.5667
7/8 [=========================>....] - ETA: 0s - loss: 0.9377 - accuracy: 0.5714
WARNING:tensorflow:Can save best model only with val_acc available, skipping.
8/8 [==============================] - 1s 98ms/step - loss: 0.9229 - accuracy: 0.5750 - val_loss: 1.2507 - val_accuracy: 0.5000

This is my code for training the CNN.

callbacks = [
        TensorBoard(log_dir=r'C:\Users\reda\Desktop\logs\{}'.format(Name),
                    histogram_freq=1),
        ModelCheckpoint(filepath=r"C:\Users\reda\Desktop\checkpoints\{}".format(Name), monitor='val_acc',
                        verbose=2, save_best_only=True, mode='max')]
history = model.fit_generator(
        train_data_gen, 
        steps_per_epoch=total_train // batch_size,
        epochs=epochs,
        validation_data=val_data_gen,
        validation_steps=total_val // batch_size,
        callbacks=callbacks)


Solution 1:[1]

I know how frustrating these things can be sometimes..but tensorflow requires that you explicitly write out the name of metric you are wanting to calculate

You will need to actually say 'val_accuracy'

metric = 'val_accuracy'
ModelCheckpoint(filepath=r"C:\Users\reda.elhail\Desktop\checkpoints\{}".format(Name), monitor=metric,
                    verbose=2, save_best_only=True, mode='max')]

Hope this helps =)

Solution 2:[2]

To add to the accepted answer as I just struggled with this. Not only do you have to use the full the metric name, it must match for your model.compile, ModelCheckpoint, and EarlyStopping. I had one set to accuracy and the other two set to val_accuracy and it did not work.

Solution 3:[3]

I had the same issue as even after mentioning the metric=val_accuracy it did not work. So I just changed it to metric=val_acc and it worked.

Solution 4:[4]

Print the metrics after training for one epoch like below. This will print the metrics defined for your model.

hist = model.fit(...)
for key in hist.history:
print(key)

Now replace them in your metrics. It will work like charm.

This hack was given by the gentleman in the below link. Thanks to him!! https://github.com/tensorflow/tensorflow/issues/33163#issuecomment-540451749

Solution 5:[5]

If you are using validation_steps or steps per epochs in model.fit() function. Remove that parameter. The validation losses and accuracy will start appearing. Just include a few parameters as possible:

model_history = model.fit(x=aug.flow(X_train, y_train, batch_size=16), epochs=EPOCHS,validation_data=[X_val, y_val], callbacks=[callbacks_list])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Brian Mark Anderson
Solution 2 BlueTurtle
Solution 3 Vilas
Solution 4 Sachin Mohan
Solution 5