'Keras loss value significant jump

I am working on a simple neural network in Keras with Tensorflow. There is a significant jump in loss value from the last mini-batch of epoch L-1 to the first mini-batch of epoch L.

I am aware that the loss should decrease with an increase in the number of iterations but a significant jump in loss after each epoch does looks strange. Here is the code snippet

tf.keras.initializers.he_uniform(seed=None)
initializer = tf.keras.initializers.he_uniform()

def my_loss(y_true, y_pred): 
   epsilon=1e-30 #epsilon is added to avoid inf/nan
   y_pred = K.cast(y_pred, K.floatx())
   y_true = K.cast(y_true, K.floatx())
   loss = y_true* K.log(y_pred+epsilon)  + (1-y_true)*K.log(1-y_pred+epsilon)
   loss = K.mean(loss, axis= -1) 
   loss = K.mean(loss)
   loss = -1*loss
   return loss

inputs = tf.keras.Input(shape=(140,))
x = tf.keras.layers.Dense(1000,kernel_initializer=initializer)(inputs)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Dense(1000,kernel_initializer=initializer)(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.Dense(1000,kernel_initializer=initializer)(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.Dense(100, kernel_initializer=initializer)(x)
outputs = tf.keras.activations.sigmoid(x)

model = tf.keras.Model(inputs=inputs, outputs=outputs)


opt = tf.keras.optimizers.Adam()
recall1 = tf.keras.metrics.Recall(top_k = 8)
c_entropy = tf.keras.losses.BinaryCrossentropy()

model.compile(loss=c_entropy, optimizer= opt , metrics = [recall1,my_loss], run_eagerly=True)

model.fit(X_train_test, Y_train_test, epochs=epochs, batch_size=batch, shuffle=True, verbose = 1)

When I search online, I found this article, which suggests that Keras calculates the moving average over the mini-batches. Also, I found somewhere that the array for calculating the moving average is reset after each epoch that's why we obtain a very smooth curve within an epoch but a jump after the epoch.

In order to avoid the moving average, I implemented my own loss function, which should output the loss values of the mini-batch instead of the moving average over the batches. As each mini-batch is different from each other; therefore the corresponding loss must also be different from each other. Due to this reason, I was expecting an arbitrary loss value on each mini-batch through my implementation of the loss function. Instead, I obtain exactly the same values as the loss function by Keras.

I am unclear about:

Is Keras calculating the moving average over the mini-batches, the array of which is reset after each epoch causing the jump. If not, then what is causing the jump behaviour in loss value.
Is my implementation of loss for each mini-batch correct? If not, then how can I obtain the loss value of the mini-batch during the training.

Solution 1:^[1]

Keras in fact shows the moving average instead of the "raw" loss values. The moving average array is reset after each epoch that's why we can see a huge jump after each epoch. In order to acquire the raw loss values, one should implement a callback as shown below:

class LossHistory(keras.callbacks.Callback):
    def on_train_begin(self, logs={}):
        #initialize a list at the begining of training
        self.losses = []

    def on_batch_end(self, batch, logs={}):
        self.losses.append(logs.get('loss'))

mycallback = LossHistory()

Then call it in model.fit

model.fit(X, Y, epochs=epochs, batch_size=batch, shuffle=True, verbose = 0, callbacks=[mycallback])
print(mycallback.losses)

I tested with the following configuration

Keras 2.3.1
Tensorflow 2.1.0
Python 3.7.9

For some reason, it didn't work with the following configuration

Keras 2.4.3
Tensorflow 2.2.0
Python 3.8.5

To answer the second question, the implementation of the loss function my_loss is correct and the values obtained are pretty much close to the values generated by the built-in function.

tf.keras.losses.BinaryCrossentropy()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'Keras loss value significant jump

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]