'Is it possible to continue training from a specific epoch?
A resource manager I'm using to fit a Keras model limits the access to a server to 1 day at a time. After this day, I need to start a new job. Is it possible with Keras to save the current model at epoch K, and then load that model to continue training epoch K+1 (i.e., with a new job)?
Solution 1:[1]
You can save weights after every epoch by specifying a callback:
weight_save_callback = ModelCheckpoint('/path/to/weights.{epoch:02d}-{val_loss:.2f}.hdf5', monitor='val_loss', verbose=0, save_best_only=False, mode='auto')
model.fit(X_train,y_train,batch_size=batch_size,nb_epoch=nb_epoch,callbacks=[weight_save_callback])
This will save the weights after every epoch. You can then load them with:
model = Sequential()
model.add(...)
model.load('path/to/weights.hf5')
Of course your model needs to be the same in both cases.
Solution 2:[2]
You can add the initial_epoch
argument. This will allow you to continue training from a specific epoch.
Solution 3:[3]
You can automatically start your training at the next epoch..!
What you need is to keep track of your training with a training log file as follow:
from keras.callbacks import ModelCheckpoint, CSVLogger
if len(sys.argv)==1:
model=... # you start training normally, no command line arguments
model.compile(...)
i_epoch=-1 # you need this to start at epoch 0
app=False # you want to start logging from scratch
else:
from keras.models import load_model
model=load_model(sys.argv[1]) # you give the saved model as input file
with open(csvloggerfile) as f: # you use your training log to get the right epoch number
i_epoch=list(f)
i_epoch=int(i_epoch[-2][:i_epoch[-2].find(',')])
app=True # you want to append to the log file
checkpointer = ModelCheckpoint(savemodel...)
csv_logger = CSVLogger(csvloggerfile, append=app)
model.fit(X, Y, initial_epoch=i_epoch+1, callbacks=[checkpointer,csv_logger])
That's all folks!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Henryk Borzymowski |
Solution 3 |