'How to avoid augmenting data in validation split of Keras ImageDataGenerator?
I'm using the following generator:
datagen = ImageDataGenerator(
fill_mode='nearest',
cval=0,
rescale=1. / 255,
rotation_range=90,
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=0.5,
horizontal_flip=True,
vertical_flip=True,
validation_split = 0.5,
)
train_generator = datagen.flow_from_dataframe(
dataframe=traindf,
directory=train_path,
x_col="id",
y_col=classes,
subset="training",
batch_size=8,
seed=123,
shuffle=True,
class_mode="other",
target_size=(64,64))
STEP_SIZE_TRAIN = train_generator.n // train_generator.batch_size
valid_generator = datagen.flow_from_dataframe(
dataframe=traindf,
directory=train_path,
x_col="id",
y_col=classes,
subset="validation",
batch_size=8,
seed=123,
shuffle=True,
class_mode="raw",
target_size=(64, 64))
STEP_SIZE_VALID = valid_generator.n // valid_generator.batch_size
Now the problem is that the validation data is also being augmented which I guess is not something you'd want to do while training. How do I avoid this? I don't have two directories for train and validation. I want to use a single dataframe to train the network. Any suggestions?
Solution 1:[1]
The solution my friend found was using a different generator but with the same validation split and no shuffle.
datagen = ImageDataGenerator(
#featurewise_center=True,
#featurewise_std_normalization=True,
rescale=1. / 255,
rotation_range=90,
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=0.5,
horizontal_flip=True,
vertical_flip=True,
validation_split = 0.15,
)
valid_datagen=ImageDataGenerator(rescale=1./255,validation_split=0.15)
and then you can define the two generators as
train_generator = datagen.flow_from_dataframe(
dataframe=traindf,
directory=train_path,
x_col="id",
y_col=classes,
subset="training",
batch_size=64,
seed=123,
shuffle=False,
class_mode="raw",
target_size=(224,224))
STEP_SIZE_TRAIN = train_generator.n // train_generator.batch_size
valid_generator = valid_datagen.flow_from_dataframe(
dataframe=traindf,
directory=train_path,
x_col="id",
y_col=classes,
subset="validation",
batch_size=64,
seed=123,
shuffle=False,
class_mode="raw",
target_size=(224, 224))
STEP_SIZE_VALID = valid_generator.n // valid_generator.batch_size
Solution 2:[2]
You can resolve this issue with a small change in your code. You can add one more ImageDataGenerator object named test_datagen, in which you will only pass the rescale parameter and no augmentation technique. So, the augmenting techniques will be in a different object, for you its datagen.You also have to split you training and testing directory before passing it to train and test data generators. I am giving you a sample code from TensorFLow, you can also refer to this.
#For traning data
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
#For testing data
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
'data/train',
target_size=(150, 150),
batch_size=32,
class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
'data/validation',
target_size=(150, 150),
batch_size=32,
class_mode='binary')
model.fit_generator(
train_generator,
steps_per_epoch=2000,
epochs=50,
validation_data=validation_generator,
validation_steps=800)
Solution 3:[3]
You should see this related question's answer: When using Data augmentation is it ok to validate only with the original images?
It says to use ImageDataGenerator with empty parameters when loading validation data, such as:
train_gen = ImageDataGenerator(aug_params).flow_from_directory(train_dir)
valid_gen = ImageDataGenerator().flow_from_directory(valid_dir)
model.fit_generator(train_gen, validation_data=valid_gen)
Solution 4:[4]
Try spitting your dataframe into separate dataframes. Then you can just do a separate generator for each dataframe.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Madara |
Solution 2 | Paras Patidar |
Solution 3 | Rawnak Yazdani |
Solution 4 | BanAckerman |