'Custom keras loss function binary cross entropy giving improper results

Did anyoe have a convincing solution to make custom_binarycrossentropy work?

I tried all possible methods (even making the whole training data size same as the bacth size to eliminate the dependence on global averaging during batch wise processing.). But i see significant difference between my binary cross entropy implementation and the one from keras ( by specifying loss = 'binary_crossentropy')

My crustom binary cross entropy code is as follows

def _loss_tensor(y_true, y_pred):
y_pred = K.clip(y_pred, _EPSILON, 1.0-_EPSILON)
out = (y_true * K.log(y_pred) + (1.0 - y_true) * K.log(1.0 - y_pred))
return -K.mean(out)
def _loss_tensor2(y_true, y_pred):
y_pred = K.clip(y_pred, _EPSILON, 1.0-_EPSILON)
out = -(y_true * K.log(y_pred) + -(1.0 - y_true) * K.log(1.0 - y_pred))
return out
def _loss_tensor2(y_true, y_pred):
loss1 = K.binary_crossentropy(y_true, y_pred)
return loss1

None of these methods work. It doesnt work even if i do K.mean() before ir eturn the results from custom loss function.

I am not able to understand what special does using loss = 'binary_crossentropy' does. When i use my custom loss function , the training sucks and it does work as expected.

I need my custom loss function to manipulate the loss function depending on the error and penalizing a certain type of classification error more.



Solution 1:[1]

I have found a way of working for this requirement and posted the same here : https://github.com/keras-team/keras/issues/4108

However, why the inbuilt function performs significantly different than the explicit formula method is unknown. I would however expect its mainly due to the handling of upper and lower bounds of probability values of the y_pred.

Solution 2:[2]

def custom_binary_loss(y_true, y_pred): 
    # https://github.com/tensorflow/tensorflow/blob/v2.3.1/tensorflow/python/keras/backend.py#L4826
    y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon())
    
    term_0 = (1 - y_true) * K.log(1 - y_pred + K.epsilon())  # Cancels out when target is 1 
    term_1 = y_true * K.log(y_pred + K.epsilon()) # Cancels out when target is 0

    return -K.mean(term_0 + term_1, axis=1)

Solution 3:[3]

I also encountered the same problem writing my custom BCE. Here is my solution:

def get_custom_bce(epsilon = 1e-2):
  def custom_bce(y_true, y_pred):
    return -tf.math.reduce_mean(y_true * tf.math.log(tf.math.maximum(y_pred, tf.constant(epsilon))) + (1. - y_true) * tf.math.log(tf.math.maximum(1. - y_pred, tf.constant(epsilon))))
return custom_bce

Sorry that I'm not so familiar with Keras backend but I believe they are interchangeable. By the way, this is meant to be used after sigmoid activation.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Seena
Solution 2 Milind Dalvi
Solution 3 ClaudeAn