'Keras history callback loss does not match with console output of loss
I am Training a cnn in Keras at the moment. Now I want to log the history of the training process for later visualizations, which I do with:
history_callback = model.fit(train_generator,
steps_per_epoch=EPOCH_STEP_TRAIN,
validation_data=test_generator,
validation_steps=EPOCH_STEP_TEST,
epochs=NUM_OF_EPOCHS,
callbacks=callbacks)
val_loss_history = history_callback.history['val_loss']
loss_history = history_callback.history['loss']
numpy_val_loss_history = np.array(val_loss_history)
numpy_loss_history = np.array(loss_history)
np.savetxt(checkpoint_folder + "valid_loss_history.txt", numpy_val_loss_history, delimiter=",")
np.savetxt(checkpoint_folder + "loss_history.txt", numpy_loss_history, delimiter=",")
The validation loss is saved correctly and matches exactly the output from the console.
But the loss values, which I store, do not match the output values from the console, while training. See here:
121/121 [==============================] - 61s 438ms/step - loss: 0.9004 - recall: 0.5097 - precision: 0.0292 - acc: 0.8391 - val_loss: 0.8893 - val_recall: 0.0000e+00 - val_precision: 0.0000e+00 - val_acc: 0.9995
Epoch 2/3
121/121 [==============================] - 52s 428ms/step - loss: 0.5830 - recall: 0.1916 - precision: 0.3660 - acc: 0.9898 - val_loss: 0.5422 - val_recall: 0.3007 - val_precision: 0.7646 - val_acc: 0.9996
Epoch 3/3
121/121 [==============================] - 52s 428ms/step - loss: 0.3116 - recall: 0.3740 - precision: 0.7848 - acc: 0.9920 - val_loss: 0.5248 - val_recall: 0.3119 - val_precision: 0.6915 - val_acc: 0.9996
And output of history_callback.history['loss'] is:
0.8124346733093262
0.4653359651565552
0.30956554412841797
My loss function is:
def dice_coef(y_true, y_pred, smooth=1e-9):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
return (2. * intersection + smooth) / (K.sum(y_true_f**2) + K.sum(y_pred_f**2) + smooth)
def dice_loss(y_true, y_pred):
return (1-dice_coef(y_true, y_pred))
I also tried:
def dice_loss(y_true, y_pred):
return tf.reduce_mean(1-dice_coef(y_true, y_pred))
Which didn't change anything.
Is there anybody out there, that can explain this weird behavior?
Solution 1:[1]
Seems as if I got the same issue.
What we have in common is TF Version 2.4.1 (CUDA 11.2, Windows 10 Pro, Quadro T1000).
The issue disappeared when I trained the same network with the same data with TF Version 2.8.0 (CUDA 11.4, Debian 10, Tesla T4)
The only other thing we have in common is that we are both using CNNs.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | R. Giskard |