'How to avoid augmenting data in validation split of Keras ImageDataGenerator?

I'm using the following generator:

datagen = ImageDataGenerator(
    fill_mode='nearest',
    cval=0,
    rescale=1. / 255,
    rotation_range=90,
    width_shift_range=0.1,
    height_shift_range=0.1,
    zoom_range=0.5,
    horizontal_flip=True,
    vertical_flip=True,
    validation_split = 0.5,
)

train_generator = datagen.flow_from_dataframe(
    dataframe=traindf,
    directory=train_path,
    x_col="id",
    y_col=classes,
    subset="training",
    batch_size=8,
    seed=123,
    shuffle=True,
    class_mode="other",
    target_size=(64,64))


STEP_SIZE_TRAIN = train_generator.n // train_generator.batch_size

valid_generator = datagen.flow_from_dataframe(
    dataframe=traindf,
    directory=train_path,
    x_col="id",
    y_col=classes,
    subset="validation",
    batch_size=8,
    seed=123,
    shuffle=True,
    class_mode="raw",
    target_size=(64, 64))

STEP_SIZE_VALID = valid_generator.n // valid_generator.batch_size

Now the problem is that the validation data is also being augmented which I guess is not something you'd want to do while training. How do I avoid this? I don't have two directories for train and validation. I want to use a single dataframe to train the network. Any suggestions?



Solution 1:[1]

The solution my friend found was using a different generator but with the same validation split and no shuffle.

datagen = ImageDataGenerator(
    #featurewise_center=True,
    #featurewise_std_normalization=True,
    rescale=1. / 255,
    rotation_range=90,
    width_shift_range=0.1,
    height_shift_range=0.1,
    zoom_range=0.5,
    horizontal_flip=True,
    vertical_flip=True,
    validation_split = 0.15,
)

valid_datagen=ImageDataGenerator(rescale=1./255,validation_split=0.15)

and then you can define the two generators as

train_generator = datagen.flow_from_dataframe(
    dataframe=traindf,
    directory=train_path,
    x_col="id",
    y_col=classes,
    subset="training",
    batch_size=64,
    seed=123,
    shuffle=False,
    class_mode="raw",
    target_size=(224,224))


STEP_SIZE_TRAIN = train_generator.n // train_generator.batch_size

valid_generator = valid_datagen.flow_from_dataframe(
    dataframe=traindf,
    directory=train_path,
    x_col="id",
    y_col=classes,
    subset="validation",
    batch_size=64,
    seed=123,
    shuffle=False,
    class_mode="raw",
    target_size=(224, 224))

STEP_SIZE_VALID = valid_generator.n // valid_generator.batch_size

Solution 2:[2]

You can resolve this issue with a small change in your code. You can add one more ImageDataGenerator object named test_datagen, in which you will only pass the rescale parameter and no augmentation technique. So, the augmenting techniques will be in a different object, for you its datagen.You also have to split you training and testing directory before passing it to train and test data generators. I am giving you a sample code from TensorFLow, you can also refer to this.

#For traning data
train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)
#For testing data
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        'data/train',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
        'data/validation',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')
model.fit_generator(
        train_generator,
        steps_per_epoch=2000,
        epochs=50,
        validation_data=validation_generator,
        validation_steps=800)

Solution 3:[3]

You should see this related question's answer: When using Data augmentation is it ok to validate only with the original images?

It says to use ImageDataGenerator with empty parameters when loading validation data, such as:

train_gen = ImageDataGenerator(aug_params).flow_from_directory(train_dir)
valid_gen = ImageDataGenerator().flow_from_directory(valid_dir)

model.fit_generator(train_gen, validation_data=valid_gen)

Solution 4:[4]

Try spitting your dataframe into separate dataframes. Then you can just do a separate generator for each dataframe.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Madara
Solution 2 Paras Patidar
Solution 3 Rawnak Yazdani
Solution 4 BanAckerman