'How should the output of my embedding layer look? Keras to PyTorch

I am in the process of translating a Keras implementation to a PyTorch one. After the full conversion my model was not converging fast enough, although the loss did seem to be decreasing. As I was tracing back my steps, I noticed something a bit odd about my embedding layer. Let me explain the data: I have 4 batches, each with a sequence length of 100, and a vocab size of 83. I am working with songs in ABC notation, so the song can have 83 different symbols in it and it is 100 symbols long. So now I have an ndarray of shape (4, 100) which contains my 4 sequences of songs. Let's call it x. Now if I pass x into an embedding layer in Keras:

tf.keras.layers.Embedding(83, 256, batch_input_shape=[4, None])(x).numpy()

I get a more "narrow" set of values for each batch than I do in PyTorch, does this affect my convergence?. I.E. the minimum value in the first batch is -0.04999 and the maximum value is 0.04999. Now if I pass the same x into my PyTorch embedding layer:

torch.nn.Embedding(4*100, 256)(torch.tensor(x)).detach().numpy()

I get a "wider" set of values for each batch. The maximum value is 3.3865 and the minimum value is -3.917.

My question is, should I be worried that this is a cause for my model not converging properly?



Solution 1:[1]

You need to understand the sequential to sequantail interactions they are not exatcly the same as numpy or matrix but they are posibility when you generate even from embedding Fn there are a bit of changes you may need training or dfilters for target actions. Example you may do it with CONV or LSTM but filters out layers that make actiuons stable or you may see this game as example !

Embedding layer:

layer_1 = model.get_layer( name="embedding_layer" )                     
###<keras.layers.embeddings.Embedding object at 0x000001AD42102A30
print(layer_1)                                                          # (83, 256)
print(layer_1.get_weights()[0].shape)                                   # (48, 64)
print('min: ' + str(np.min(layer_1.get_weights()[0])))                  #  min: -0.049991023
print('max: ' + str(np.max(layer_1.get_weights()[0])))                  #  max: 0.049998153

Output:

? the first time 
<keras.layers.embeddings.Embedding object at 0x000001FA0BE74A30>
(83, 256)
min: -0.049991023
max: 0.049998153

? the second time 
<keras.layers.embeddings.Embedding object at 0x00000214A1C34A30>
(83, 256)
min: -0.04999887
max: 0.049993087

? the third time 
<keras.layers.embeddings.Embedding object at 0x00000283B20F3A30>
(83, 256)
min: -0.049999725
max: 0.049998928

Sample of actions from limited inputs:

This is proving the randoms actions is working correct with simple lines of code

gameState = p.getGameState()
### {'player_x': 102, 'player_vel': 0.0, 'fruit_x': 30, 'fruit_y': -120}

player_x_array = gameState['player_x']
player_vel_array = gameState['player_vel']
fruit_x_array = gameState['fruit_x']
fruit_y_array = gameState['fruit_y']
        
### x is less then go left
var_1 = player_x_array - fruit_x_array                          ## right
var_2 = player_x_array - fruit_x_array                          ## left
var_3 = fruit_y_array - ( player_x_array - fruit_x_array )
        
print(str(var_1) + " " + str(var_2) + " " + str(var_3))
        
temp = tf.random.normal([len(posibility_actions)], 1, 0.2, tf.float32)
temp = np.asarray(temp) * np.asarray([ var_1, var_2, var_3 ])
temp = tf.nn.softmax(temp)
action = int(np.argmax(temp))
        
reward = p.act(posibility_actions[action])
print('random action: ' + str(posibility_actions[action]))

It should not be any problem when it pass though multiple lines of layers that filters out no need information, see input and output what is the taks they generated?

... Randoms actions

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Martijn Pieters