'validation and train metrics very low values (images and masks generator)

I have images(X_train) and masks data (y_train).

I want to train a unet network. I am currently using iou metric and the validation iou is very low and constant!

I am not sure if I can handle right the scaling preprocessing of images and masks.

I have tried either to use only rescale=1.0/255 in the generator, either to scale only X_train and X_val hence (images) values and not masks values, either scale in the unet model (s = Lambda(lambda x: x / 255.0) (inputs)) . I am not sure if that is the problem, just wondering.

here you can download X_train and y_train data

import tensorflow as tf
import numpy as np
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv2D,  MaxPooling2D, Conv2DTranspose, \
    Dropout, Input, Concatenate, Lambda
from imgaug import augmenters as iaa
from tensorflow.keras import backend as K


# gpu setup
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)
    

X_train = np.load('./X_train.npy')
y_train = np.load('/y_train.npy')


X_train = X_train.astype('uint8')
y_train = y_train.astype('uint8')


BATCH_SIZE=8
SEED=123
VAL_SPLIT = 0.2
IMG_HEIGHT = 256
IMG_WIDTH = 256

def augment(images):

    seq =  iaa.Sequential([
        
        iaa.Fliplr(0.5), # horizontal flips
        iaa.Flipud(0.5), # vertical flips
       
        iaa.Sometimes(
             0.1,
             iaa.GaussianBlur(sigma=(0, 0.5))
        ),
        iaa.LinearContrast((0.75, 1.5)),
       
        iaa.Sharpen(alpha=(0, 1.0), lightness=(0.75, 1.5)),
              
        iaa.BlendAlphaSimplexNoise(
            iaa.EdgeDetect(0.3),
            upscale_method="linear"),
        
       
        
    ], random_order=True) 
    
    return seq.augment_image(images)


def create_gen(X,
               y,
               batch_size=BATCH_SIZE,
               seed=SEED):
    
    X_train, X_val, y_train, y_val = \
        train_test_split(X,
                         y,
                         test_size=VAL_SPLIT)
        
    
    # Image data generator
    data_gen_args = dict(rescale = 1.0/255,
                         preprocessing_function=augment)

    data_gen_args_masks = dict(                      preprocessing_function=augment)
                             
    X_datagen = ImageDataGenerator(**data_gen_args)
    y_datagen = ImageDataGenerator(**data_gen_args_masks)
    
    X_datagen.fit(X_train, augment=True, seed=seed)
    y_datagen.fit(y_train, augment=True, seed=seed)
    
    X_train_augmented = X_datagen.flow(X_train,
                                       batch_size=batch_size,
                                       shuffle=True,
                                       seed=seed)
    y_train_augmented = y_datagen.flow(y_train,
                                       batch_size=batch_size,
                                       shuffle=True,
                                       seed=seed)
    
    # Validation data generator     
    data_gen_args_val = dict(rescale = 1.0/255)
                                                     
    X_datagen_val = ImageDataGenerator(**data_gen_args_val)
    y_datagen_val = ImageDataGenerator()
    
    X_datagen_val.fit(X_val, augment=True, seed=seed)
    y_datagen_val.fit(y_val, augment=True, seed=seed)
    
    X_val_after = X_datagen_val.flow(X_val,
                                     batch_size=batch_size,
                                     shuffle=False)
                                    
    y_val_after = y_datagen_val.flow(y_val,
                                     batch_size=batch_size,
                                     shuffle=False)
                                    
     
    train_generator = zip(X_train_augmented, y_train_augmented)
    val_generator = zip(X_val_after, y_val_after)
    
    steps_per_epoch = X_train_augmented.n // X_train_augmented.batch_size
    validation_steps = X_val_after.n // X_val_after.batch_size
    return train_generator, val_generator, steps_per_epoch, validation_steps


train_generator, val_generator, steps_per_epoch, validation_steps =  \
    create_gen(X_train,
               y_train,
               batch_size=BATCH_SIZE)


# Build U-Net model
inputs = Input((IMG_HEIGHT, IMG_WIDTH, 3))
#s = Lambda(lambda x: x / 255) (inputs)  # rescale inputs

c1 = Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (inputs)
c1 = Dropout(0.1) (c1)
c1 = Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (c1)
p1 = MaxPooling2D((2, 2)) (c1)

c2 = Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (p1)
c2 = Dropout(0.1) (c2)
c2 = Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (c2)
p2 = MaxPooling2D((2, 2)) (c2)

c3 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (p2)
c3 = Dropout(0.2) (c3)
c3 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (c3)
p3 = MaxPooling2D((2, 2)) (c3)

c4 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (p3)
c4 = Dropout(0.2) (c4)
c4 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (c4)
p4 = MaxPooling2D(pool_size=(2, 2)) (c4)

c5 = Conv2D(256, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (p4)
c5 = Dropout(0.3) (c5)
c5 = Conv2D(256, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (c5)

u6 = Conv2DTranspose(128, (2, 2), strides=(2, 2), padding='same') (c5)
u6 = Concatenate()([u6, c4])
c6 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (u6)
c6 = Dropout(0.2) (c6)
c6 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (c6)

u7 = Conv2DTranspose(64, (2, 2), strides=(2, 2), padding='same') (c6)
u7 = Concatenate()([u7, c3])
c7 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (u7)
c7 = Dropout(0.2) (c7)
c7 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (c7)

u8 = Conv2DTranspose(32, (2, 2), strides=(2, 2), padding='same') (c7)
u8 = Concatenate()([u8, c2])
c8 = Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (u8)
c8 = Dropout(0.1) (c8)
c8 = Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (c8)

u9 = Conv2DTranspose(16, (2, 2), strides=(2, 2), padding='same') (c8)
u9 = Concatenate()([u9, c1])
c9 = Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (u9)
c9 = Dropout(0.1) (c9)
c9 = Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same') (c9)

outputs = Conv2D(1, (1, 1), activation='sigmoid') (c9)

model = Model(inputs=[inputs], outputs=[outputs])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=[iouMetric])

EPOCHS = 40

model.fit( train_generator,
           validation_data=val_generator,
           batch_size=BATCH_SIZE,
           steps_per_epoch= steps_per_epoch,
           validation_steps=validation_steps,
 epochs=EPOCHS)

code for ioumetric:

def castF(x):
    return K.cast(x, K.floatx())

def castB(x):
    return K.cast(x, bool)

def iou_loss_core(true,pred):  #this can be used as a loss if you make it negative
    intersection = true * pred
    notTrue = 1 - true
    union = true + (notTrue * pred)

    return (K.sum(intersection, axis=-1) + K.epsilon()) / (K.sum(union, axis=-1) + K.epsilon())

def iouMetric(true, pred):

    tresholds = [0.5 + (i * 0.05)  for i in range(5)]

    #flattened images (batch, pixels)
    true = K.batch_flatten(true)
    pred = K.batch_flatten(pred)
    pred = castF(K.greater(pred, 0.5))

    #total white pixels - (batch,)
    trueSum = K.sum(true, axis=-1)
    predSum = K.sum(pred, axis=-1)

    #has mask or not per image - (batch,)
    true1 = castF(K.greater(trueSum, 1))    
    pred1 = castF(K.greater(predSum, 1))

    #to get images that have mask in both true and pred
    truePositiveMask = castB(true1 * pred1)

    #separating only the possible true positives to check iou
    testTrue = tf.boolean_mask(true, truePositiveMask)
    testPred = tf.boolean_mask(pred, truePositiveMask)

    #getting iou and threshold comparisons
    iou = iou_loss_core(testTrue,testPred) 
    truePositives = [castF(K.greater(iou, tres)) for tres in tresholds]

    #mean of thressholds for true positives and total sum
    truePositives = K.mean(K.stack(truePositives, axis=-1), axis=-1)
    truePositives = K.sum(truePositives)

    #to get images that don't have mask in both true and pred
    trueNegatives = (1-true1) * (1 - pred1) # = 1 -true1 - pred1 + true1*pred1
    trueNegatives = K.sum(trueNegatives) 

    return (truePositives + trueNegatives) / castF(K.shape(true)[0])

I tried other metrics as well, dice loss is also constant and very low. Accuracy is around 79 and constant.



Solution 1:[1]

The problem is with the pre-processing. According to tf.keras.preprocessing.image.ImageDataGenerator documentation:

preprocessing_function: function that will be applied on each input. The function will run after the image is resized and augmented. The function should take one argument: one image (Numpy tensor with rank 3), and should output a Numpy tensor with the same shape.

So, this function will run additionally after the augmentation you mentioned inside ImageDataGenerator. But the problem is that you already have scaled the images with 1.0 / 255, so, the augment() function is getting a scaled image. But according to the documentation of imgaug, it wants an un-scaled image (see the comment inside the example):

'images' should be either a 4D numpy array of shape (N, height, width, channels) or a list of 3D numpy arrays, each having shape (height, width, channels). Grayscale images must have shape (height, width, 1) each. All images must have numpy's dtype uint8. Values are expected to be in range 0-255.

Edit: In the output you are using sigmoid activation function. Which will force the output to be always within [0, 1]. But you are not scaling the mask, that means the mask will be within [0, 255]. So, for obvious reason, the model will never be able output these large values, as it is restricted to be within [0, 1]. So, remove the sigmoid from the last layer and see what happens.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1