'Keras loss value significant jump
I am working on a simple neural network in Keras with Tensorflow. There is a significant jump in loss value from the last mini-batch of epoch L-1 to the first mini-batch of epoch L.
I am aware that the loss should decrease with an increase in the number of iterations but a significant jump in loss after each epoch does looks strange. Here is the code snippet
tf.keras.initializers.he_uniform(seed=None)
initializer = tf.keras.initializers.he_uniform()
def my_loss(y_true, y_pred):
epsilon=1e-30 #epsilon is added to avoid inf/nan
y_pred = K.cast(y_pred, K.floatx())
y_true = K.cast(y_true, K.floatx())
loss = y_true* K.log(y_pred+epsilon) + (1-y_true)*K.log(1-y_pred+epsilon)
loss = K.mean(loss, axis= -1)
loss = K.mean(loss)
loss = -1*loss
return loss
inputs = tf.keras.Input(shape=(140,))
x = tf.keras.layers.Dense(1000,kernel_initializer=initializer)(inputs)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Dense(1000,kernel_initializer=initializer)(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.Dense(1000,kernel_initializer=initializer)(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.Dense(100, kernel_initializer=initializer)(x)
outputs = tf.keras.activations.sigmoid(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
opt = tf.keras.optimizers.Adam()
recall1 = tf.keras.metrics.Recall(top_k = 8)
c_entropy = tf.keras.losses.BinaryCrossentropy()
model.compile(loss=c_entropy, optimizer= opt , metrics = [recall1,my_loss], run_eagerly=True)
model.fit(X_train_test, Y_train_test, epochs=epochs, batch_size=batch, shuffle=True, verbose = 1)
When I search online, I found this article, which suggests that Keras calculates the moving average over the mini-batches. Also, I found somewhere that the array for calculating the moving average is reset after each epoch that's why we obtain a very smooth curve within an epoch but a jump after the epoch.
In order to avoid the moving average, I implemented my own loss function, which should output the loss values of the mini-batch instead of the moving average over the batches. As each mini-batch is different from each other; therefore the corresponding loss must also be different from each other. Due to this reason, I was expecting an arbitrary loss value on each mini-batch through my implementation of the loss function. Instead, I obtain exactly the same values as the loss function by Keras.
I am unclear about:
- Is Keras calculating the moving average over the mini-batches, the array of which is reset after each epoch causing the jump. If not, then what is causing the jump behaviour in loss value.
- Is my implementation of loss for each mini-batch correct? If not, then how can I obtain the loss value of the mini-batch during the training.
Solution 1:[1]
Keras in fact shows the moving average instead of the "raw" loss values. The moving average array is reset after each epoch that's why we can see a huge jump after each epoch. In order to acquire the raw loss values, one should implement a callback as shown below:
class LossHistory(keras.callbacks.Callback):
def on_train_begin(self, logs={}):
#initialize a list at the begining of training
self.losses = []
def on_batch_end(self, batch, logs={}):
self.losses.append(logs.get('loss'))
mycallback = LossHistory()
Then call it in model.fit
model.fit(X, Y, epochs=epochs, batch_size=batch, shuffle=True, verbose = 0, callbacks=[mycallback])
print(mycallback.losses)
I tested with the following configuration
Keras 2.3.1
Tensorflow 2.1.0
Python 3.7.9
For some reason, it didn't work with the following configuration
Keras 2.4.3
Tensorflow 2.2.0
Python 3.8.5
To answer the second question, the implementation of the loss function my_loss
is correct and the values obtained are pretty much close to the values generated by the built-in function.
tf.keras.losses.BinaryCrossentropy()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |