'Keras LSTM for text prediction does not learn
I'm trying to train a keras LSTM model as follows:
model = tf.keras.Sequential()
model.add(layers.LSTM(256, input_shape=(X.shape[1], X.shape[2]), return_sequences=True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(256))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(Y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# define the checkpoint
filepath="weights-improvement-{epoch:02d}-{loss:.4f}-bigger.hdf5"
checkpoint = tf.keras.callbacks.ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
# fit the model
model.fit(X, Y, epochs=50, batch_size=100, callbacks=callbacks_list)
The model should predict chess games. For example a parsed and coded chess game is as follows:
[3379, 905, 2967, 4705, 4569, 8954, 6732, 5282, 9178, 9052, 486, 1665, 2459, 3736, 1444, 9409, 5841, 10023, 9484, 5841, 5541, 2605, 2178, 7843, 4679, 7242, 4755, 4755, 3530, 64, 6468, 407, 5105, 8224, 2892, 3736, 9026, 3819, 5592, 9178, 5862, 5845, 4246, 8380, 9324, 4778, 4341, 2068, 344, 9004, 7089, 2180, 6549, 2174, 9754, 8602, 4339, 7291, 7291, 9968, 7920, 8392, 6004, 7516, 6541, 9409, 6215, 2263, 5098, 2672, 8573, 6537, 7073, 4551, 9004]
my x and y are like this: X identifies splits every 5 moves
#X:
[[3379, 905, 2967, 4705, 4569] [905, 2967, 4705, 4569, 8954] [2967, 4705, 4569, 8954, 6732] ... [2672, 8573, 6537, 7073, 4551]]
Y is the list of moves that must be guessed at the end of each set of moves
#Y:
[3379, 905, 2967, 4705, 4569, 8954, 6732 ... 2672, 8573, 6537, 7073, 4551, 9004]
subsequently Y was encoded in base5 as follows:
[[3 1 0 1 3 2] [1 1 0 1 2 1] [2 1 0 1 3 4] ... ]
The model doesn't learn anything, the first time I tried to make the model.fit with 100 epochs and a batch_size of 64 using normalised X but the loss value grew a lot, at the first epoch the loss value was 0.56 while at the last it was 266.50... I tried to remove the normalization and make the model.fit with 50 epochs and a batch_size of 100 and now the results are stable. At the first epoch the loss value is 36.11 while the accuracy is 0.316, at the 50th epoch the loss value is 46.80 and the accuracy is 0.317.
Does anyone know why my model does not learn?
UPDATE 24/01/22:
After @MarcelB's comment I publish the results with 1 sample and 10 samples.
I changed the batch_size value of my model to 5 from 100.
On the following picture I have the graphs of my train with 1 sample:
train with 1 sample
I don't know if it's right but the 10 samples (10 matches) I put together as one big match. On the following picture I have the graphs of my train with 10 sample: train with 10 sample
Solution 1:[1]
Dropout is used to interfere with the learning process by deactivating a specific number of neurons or connections randomly during each epoch. You have two dropouts in your model, that means, after your LSTM layers about 20% of the inputs for the next layer are deactivated, which can seriously diminish the performance. You should not include dropouts if your model does not overfit. Remove these layers and check again.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Arne Decker |