'Why is Tensorflow image classification model overfitting? [closed]
I've been working a food image classification model.
I started off with the TensorFlow tutorial and modified the model (code below). The model trains fine however whenever it gets to ~50-60% validation accuracy it starts overfitting and I have no idea why. This is the best result I got:
I am using 6 classes from the Food-101 dataset for training and validation split 750 train 250 validation. My data folder structure looks like
images - train - class 1 - <750 images>
- .......
- class 6 - <750 images>
- validation - class 1 - <250 images>
- .......
- class 6 - <250 images>
Things I have tried:
- adding additional data augmentation using ImageDataGenerator
- removing layers
- adding additional conv2d and drop layers
- increasing the number of neurons in the layers
- reducing the number of neurons in the layers
- various models I found online including the tensorflow tutorial one
- adding kernel_initializer='he_uniform' and kernel_regularizer=l2(0.001) to the conv2d layers
- doubling the training and validation data by adding the food-101N dataset as well.
- altering the learning rate
- using one-hot
Everything I try gives me the roughly same result. I am training the model on an AMD Ryzen 5 3600 CPU with 16 Gb of RAM.
What am I doing wrong?
Model summary
Layer (type) Output Shape Param #
=================================================================
sequential_1 (Sequential) (None, 224, 224, 3) 0
_________________________________________________________________
conv2d (Conv2D) (None, 224, 224, 16) 448
_________________________________________________________________
conv2d_1 (Conv2D) (None, 224, 224, 16) 2320
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 112, 112, 16) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 112, 112, 32) 4640
_________________________________________________________________
conv2d_3 (Conv2D) (None, 112, 112, 32) 9248
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 56, 56, 32) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 56, 56, 64) 18496
_________________________________________________________________
conv2d_5 (Conv2D) (None, 56, 56, 64) 36928
_________________________________________________________________
conv2d_6 (Conv2D) (None, 56, 56, 64) 36928
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 28, 28, 64) 0
_________________________________________________________________
dropout (Dropout) (None, 28, 28, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 50176) 0
_________________________________________________________________
dense (Dense) (None, 128) 6422656
_________________________________________________________________
dense_1 (Dense) (None, 6) 774
=================================================================
Total params: 6,532,438
Trainable params: 6,532,438
Non-trainable params: 0
_________________________________________________________________
Here's the full code
import matplotlib.pyplot as plt
import numpy as np
import os
import PIL
import tensorflow as tf
import cv2
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from sklearn.metrics import classification_report,confusion_matrix
import time
import pathlib
training_dir = pathlib.Path('.../images/train')
validation_dir = pathlib.Path('.../images/validation')
batch_size = 32
img_height = 224
img_width = 224
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
training_dir,
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
validation_dir,
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
class_names = train_ds.class_names
print(class_names)
print(val_ds.class_names)
AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
normalization_layer = layers.experimental.preprocessing.Rescaling(1./255)
resize_and_rescale = tf.keras.Sequential([
layers.experimental.preprocessing.Resizing(img_height, img_width),
layers.experimental.preprocessing.Rescaling(1./255)
])
normalized_ds = train_ds.map(lambda x, y: (resize_and_rescale(x, training=True), y))
image_batch, labels_batch = next(iter(normalized_ds))
data_augmentation = keras.Sequential(
[
layers.experimental.preprocessing.RandomFlip("horizontal",
input_shape=(img_height,
img_width,
3)),
layers.experimental.preprocessing.RandomRotation(0.1),
layers.experimental.preprocessing.RandomZoom(0.1),
]
)
num_classes = 6
model = Sequential([
data_augmentation,
layers.experimental.preprocessing.Rescaling(1./255),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.4),
layers.Flatten(),
layers.Dense(256, activation='relu'),
layers.Dense(num_classes, activation='softmax')
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
metrics=['accuracy'])
model.summary()
start_time = time.monotonic()
epochs = 50
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
#plotting and performance metrics
Solution 1:[1]
Overcoming overfit is a matter of principled work, not just randomly flailing about.
In general, (i.e disregarding the black-magic of deep-learning otherwise known as double-descent), overfit is a matter of too much effective capacity compared to the training data at hand. This is better known as the "bias-variance-tradeoff".
So adding augmentations is a form of increasing the data (with some redundancy due to correlation, so it's not a matter of simple multiplication), which helps overcome overfit.
Reducing capacity is another form of reducing overfit, so removing layers and/or filters helps reduce capacity. #3 and #4 on your list are the exact opposite of what you intuitively expect would help...
A third form of overfit reduction is to remove capacity via regularization factors, such as weight-decay, drop-out, auxiliary loss heads, etc...
It's worth noting that these are all different sides of the same coin; augmentations can be viewed as a form of regularization for instance, whereby we're enforcing certain invariances upon the model, by requiring the same output for inputs that differ only by some function that we deem as a non-relevant variance.
Putting it all together, you want to try and do the following, maybe all in conjunction:
- Remove layers and/or filters
- Add regularization such as L2 norm on the weights, drop-out, etc.
- Add augmentations, or make them parametrically more extreme (up to a certain point of course)
- Add auxiliary tasks...Very much depending on the data and labels available.
Two more important notes!
- Overfit is sometimes a matter of the training and validation data actually being sampled from different distributions. This is not to say that it's impossible to generalize, but it often means you need to put in extra effort for things to work. (e.g. cross-domain generalization, out-of-sample inference, etc.)
- The best practice, when possible, is to add new data! This can be added to the list above, not necessarily instead of the recommendations listed.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | desertnaut |