'Hyperparameter optimization of transfer learning model with Keras Tuner
I want to perform hyperparameter optimization on my transfer learning model using Keras Tuner. I am not sure how to do this since I have two stages of training,
- freezing the whole network and only train the last layer to converge to the new classes and
- unfreeze and train the network.
In this paper and this paper, an approach for hyperparameter optimization when using transfer learning is proposed, called shared hyperparameter optimization. They state that sharing one set of hyperparameters among both stages leads to the best results. However, I do not understand what they particularly mean by "one set of hyperparameters" and whether it is possible to implement this using Keras Tuner (they use GPyOpt).
Any help for understanding this concept or any other idea/experience on how to perform hyperparameter optimization for transfer learning models is appreciated!
Solution 1:[1]
I can't be sure without you posting your model code, but in my case most of my hyperparameters that needed tuning were in the output layer of my model, and the base_model was an InceptionResNetv2. You can see here:
def build_model(hp):
METRICS = [
'accuracy',
tf.keras.metrics.AUC(name='auc'),
tf.keras.metrics.Precision(name='precision'),
tf.keras.metrics.Recall(name='recall')
]
inputs = tf.keras.Input(shape=(512,512,3))
base_model = InceptionResNetV2(include_top=False, weights='imagenet', input_shape=(512,512,3), input_tensor=inputs)
base_model.trainable = False
target_conv_layer = list(filter(lambda x: isinstance(x, tf.keras.layers.Conv2D), base_model.layers))[-1].name
conv_layer = base_model.get_layer(target_conv_layer)
x = GlobalAveragePooling2D()(conv_layer.output)
x = tf.keras.layers.GaussianNoise(hp.Float("gn", min_value=0.3, max_value=0.6, step=0.1))(x)
x = Dense(units=hp.Int("units", min_value=128, max_value=512, step=32), activation="relu", kernel_regularizer="l1")(x)
x = Dropout(hp.Float("dropout", min_value=0.1, max_value=0.6, step=0.1))(x)
predictions = Dense(3, activation="softmax")(x)
model = Model(inputs=inputs, outputs=predictions)
lr = hp.Float("lr", min_value=0.0001, max_value=0.01, sampling="log")
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate = lr),
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=[METRICS])
return model
Therefore, you can use this function with keras-tuner to gather the best hyperparameters for the model as the output layer is trained with your dataset and will learn how to classify images from the convolutional output.
Then you can move on to fine-tuning with these hyperparameters which by my understanding would be the "one-set of hyperparameters".
If you wanted to do further hyperparameter tuning you could save the entire models with the current hyperparameters you found then define a new build_model function that loads in the model and sets up the model for fine-tuning. This worked for me - I am not sure if it's the best approach however. The new build_model function would look something like:
def build_model(hp):
model = tf.keras.models.load_model("your_model")
model.trainable = True
for layer in model.layers[:100]:
layer.trainable = False
lr = hp.Float("lr", min_value=0.0001, max_value=0.01, sampling="log")
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate = learning_rate),loss=tf.keras.losses.CategoricalCrossentropy(),metrics=[METRICS])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Tom Cotter |