'Correct Implementation of Dice Loss in Tensorflow / Keras

I've been trying to experiment with Region Based: Dice Loss but there have been a lot of variations on the internet to a varying degree that I could not find two identical implementations. The problem is that all of these produce varying results. Below are the implementations that I found. Some uses smoothing factor which the authors in this paper have called epsilon, some use it in both numerator and denominator, one implementation used Gamma etc etc.

Could someone please help me with the correct implementation.

import tensorflow as tf
import tensorflow.keras.backend as K
import numpy as np

def dice_loss1(y_true, y_pred, smooth=1e-6):
    '''
    https://www.kaggle.com/code/bigironsphere/loss-function-library-keras-pytorch/notebook
    '''
    y_pred = tf.convert_to_tensor(y_pred)
    y_true = tf.cast(y_true, y_pred.dtype)
    smooth = tf.cast(smooth, y_pred.dtype)
    
    y_pred = K.flatten(y_pred)
    y_true = K.flatten(y_true)
    
    intersection = K.sum(K.dot(y_true, y_pred))    
    dice_coef = (2*intersection + smooth) / (K.sum(y_true) + K.sum(y_pred) + smooth)
    dice_loss = 1-dice_coef
    return dice_loss
    

def dice_loss2(y_true, y_pred, smooth=1e-6): # Only Smooth
    """
    https://gist.github.com/wassname/7793e2058c5c9dacb5212c0ac0b18a8a
    """
    y_pred = tf.convert_to_tensor(y_pred)
    y_true = tf.cast(y_true, y_pred.dtype)
    smooth = tf.cast(smooth, y_pred.dtype)
    
    intersection = K.sum(K.abs(y_true * y_pred), axis=-1)
    dice_coef  = (2. * intersection + smooth) / (K.sum(K.square(y_true),-1) + K.sum(K.square(y_pred),-1) + smooth)
    return 1- dice_coef


def dice_loss3(y_true, y_pred): # No gamma, no smooth
    '''
    https://lars76.github.io/2018/09/27/loss-functions-for-segmentation.html
    '''
    y_pred = tf.convert_to_tensor(y_pred)
    y_true = tf.cast(y_true, y_pred.dtype)
    
    y_pred = tf.math.sigmoid(y_pred)
    numerator = 2 * tf.reduce_sum(y_true * y_pred)
    denominator = tf.reduce_sum(y_true + y_pred)

    return 1 - numerator / denominator


def dice_loss4(y_true, y_pred, smooth=1e-6, gama=1): # Gama + Smooth is used
    '''
    https://dev.to/_aadidev/3-common-loss-functions-for-image-segmentation-545o
    '''
    y_pred = tf.convert_to_tensor(y_pred)
    y_true = tf.cast(y_true, y_pred.dtype)
    smooth = tf.cast(smooth, y_pred.dtype)
    gama = tf.cast(gama, y_pred.dtype)

    nominator = 2 * tf.reduce_sum(tf.multiply(y_pred, y_true)) + smooth
    denominator = tf.reduce_sum(y_pred ** gama) + tf.reduce_sum(y_true ** gama) + smooth

    result = 1 - tf.divide(nominator, denominator)
    return result

y_true = np.array([[0,0,1,0],
                   [0,0,1,0],
                   [0,0,1.,0.]])

y_pred = np.array([[0,0,0.9,0],
                   [0,0,0.1,0],
                   [1,1,0.1,1.]])

# print(dice_loss1(y_true, y_pred)) # Gives you error in K.dot()
print(dice_loss2(y_true, y_pred))
print(dice_loss3(y_true, y_pred)) # provides array of values
print(dice_loss4(y_true, y_pred))


Solution 1:[1]

I utilized a variation of the dice loss for brain tumor segmentation. The implementation for the dice coefficient which I used for such results was:

def dice_coef(y_true, y_pred, smooth=100):        
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = K.sum(y_true_f * y_pred_f)
    dice = (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
    return dice

In order to make it a loss, it needs to be made into a function we want to minimize. This can be accomplished by making it negative:

def dice_coef_loss(y_true, y_pred):
    return -dice_coef(y_true, y_pred)

or subtracting it from 1:

def dice_coef_loss(y_true, y_pred):
    return 1 - dice_coef(y_true, y_pred)

or applying some other function then negating - for example, taking the negative logarithm (which could smooth the gradients):

def dice_coef_loss(y_true, y_pred):
    return -K.log(dice_coef(y_true, y_pred))

The variable smooth represents your observation in other implementations with various names (smoothing, epsilon, etc.). Just for clarity, this smoothing variable exists to handle the case where the ground truth has very few white (or no) white pixels (assuming white pixels belonging to a class or boundary of an object, depending on your implementation).

If smooth is set too low, when the ground truth has few to 0 white pixels and the predicted image has some non-zero number of white pixels, the model will be penalized more heavily. Setting smooth higher means if the predicted image has some low amount of white pixels when the ground truth has none, the loss value will be lower. Depending on how aggressive the model needs to be, though, maybe a lower value is good.

Here's an illustrative example:

import numpy as np
import tensorflow as tf
from tensorflow.keras import backend as K


def dice_coef(y_true, y_pred, smooth):
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = K.sum(y_true_f * y_pred_f)
    dice = (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
    return dice


def dice_coef_loss(y_true, y_pred, smooth):
    return 1 - dice_coef(y_true, y_pred, smooth)


if __name__ == '__main__':
    smooth = 10e-6
    y_pred = np.zeros((1, 128, 128))
    # one pixel is set to 1
    y_pred[0, 0, 0] = 1
    y_pred = tf.convert_to_tensor(y_pred, dtype=tf.float32)
    y_true = tf.zeros((1, 128, 128), dtype=tf.float32)
    print(dice_coef(y_true, y_pred, smooth=smooth))
    print(dice_coef_loss(y_true, y_pred, smooth=smooth))

will print out:

tf.Tensor(9.9999e-06, shape=(), dtype=float32)
tf.Tensor(0.99999, shape=(), dtype=float32)

But if smooth is set to 100:

tf.Tensor(0.990099, shape=(), dtype=float32)
tf.Tensor(0.009900987, shape=(), dtype=float32)

Showing the loss reduces to 0.009 instead of 0.99.

For completeness, if you have multiple segmentation channels (B X W X H X K, where B is the batch size, W and H are the dimensions of your image, and K are the different segmentations channels), the same concepts apply, but it can be implemented as follows:

def dice_coef_multilabel(y_true, y_pred, M, smooth):
    dice = 0
    for index in range(M):
        dice += dice_coef(y_true[:,:,:,index], y_pred[:,:,:,index], smooth)
    return dice

And it can be converted to a loss function through negation or subtraction, in the same way as dice_coef is. smooth could also be tuned per channel, if you supply a list or some other sequence (e.g; smooth_list):

def dice_coef_multilabel(y_true, y_pred, M, smooth_list):
    dice = 0
    for index in range(M):
        dice += dice_coef(y_true[:,:,:,index], y_pred[:,:,:,index], smooth_list[index])
    return dice

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 danielcahall