'Bert embedding layer raises 'ValueError: A target array with shape ' with BiLSTM in keras tensorflow

I've problems integrating Bert Embedding Layer in a BiLSTM model for text classification task.

My dataset is in the form where each row has 2 columns: text and polarity

text = string/tweet

polarity = can be 0 or 1

So the shape of training data is (1500,2)

I am generating BERT embeddings following this code https://github.com/strongio/keras-bert/blob/master/keras-bert.ipynb

I want to add Bi-LSTM between Bert Layer and the Dense layer. I have done it like this:

# Build model
def build_model(max_seq_length): 
    embedding_size = 768
    in_id = tf.keras.layers.Input(shape=(max_seq_length,), name="input_ids")
    in_mask = tf.keras.layers.Input(shape=(max_seq_length,), name="input_masks")
    in_segment = tf.keras.layers.Input(shape=(max_seq_length,), name="segment_ids")
    bert_inputs = [in_id, in_mask, in_segment]
    
    bert_output = BertLayer(n_fine_tune_layers=3, pooling="mean")(bert_inputs)
    bert_output = tf.keras.layers.Reshape((max_seq_length, embedding_size))(bert_output) 
    bilstm = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128, dropout=0.2,recurrent_dropout=0.2,return_sequences=True))(bert_output)
    output = tf.keras.layers.Dense(1, activation="softmax")(bilstm)
    
    model = tf.keras.models.Model(inputs=bert_inputs, outputs=output)
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    model.summary()
    
    return model

def initialize_vars(sess):
    sess.run(tf.local_variables_initializer())
    sess.run(tf.global_variables_initializer())
    sess.run(tf.tables_initializer())
    K.set_session(sess)

model = build_model(max_seq_length)

# Instantiate variables
initialize_vars(sess)

model.fit(
    [train_input_ids, train_input_masks, train_segment_ids], 
    train_labels,
    validation_data=([test_input_ids, test_input_masks, test_segment_ids], test_labels),
    epochs=1,
    batch_size=32
)

It gives an error: ValueError: A target array with shape (1500, 1) was passed for an output of shape (None, 256, 1) while using as loss `binary_crossentropy`. This loss expects targets to have the same shape as the output.

INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/init_ops.py:97: calling GlorotUniform.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/init_ops.py:97: calling GlorotUniform.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/init_ops.py:97: calling Orthogonal.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/init_ops.py:97: calling Orthogonal.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/init_ops.py:97: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/init_ops.py:97: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_ids (InputLayer)          [(None, 256)]        0                                            
__________________________________________________________________________________________________
input_masks (InputLayer)        [(None, 256)]        0                                            
__________________________________________________________________________________________________
segment_ids (InputLayer)        [(None, 256)]        0                                            
__________________________________________________________________________________________________
bert_layer (BertLayer)          (None, 768)          110104890   input_ids[0][0]                  
                                                                 input_masks[0][0]                
                                                                 segment_ids[0][0]                
__________________________________________________________________________________________________
reshape (Reshape)               (None, 256, 768)     0           bert_layer[0][0]                 
__________________________________________________________________________________________________
bidirectional (Bidirectional)   (None, 256, 256)     918528      reshape[0][0]                    
__________________________________________________________________________________________________
dense (Dense)                   (None, 256, 1)       257         bidirectional[0][0]              
==================================================================================================
Total params: 111,023,675
Trainable params: 22,182,401
Non-trainable params: 88,841,274
__________________________________________________________________________________________________
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-28-827856e3678d> in <module>()
      9     validation_data=([test_input_ids, test_input_masks, test_segment_ids], test_labels),
     10     epochs=1,
---> 11     batch_size=32
     12 )

3 frames
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/keras/engine/training_utils.py in check_loss_and_target_compatibility(targets, loss_fns, output_shapes)
    739           raise ValueError('A target array with shape ' + str(y.shape) +
    740                            ' was passed for an output of shape ' + str(shape) +
--> 741                            ' while using as loss `' + loss_name + '`. '
    742                            'This loss expects targets to have the same shape '
    743                            'as the output.')

ValueError: A target array with shape (1500, 1) was passed for an output of shape (None, 256, 1) while using as loss `binary_crossentropy`. This loss expects targets to have the same shape as the output.

What can I do to resolve this? Does it have something to do with what activation or loss is being used ? How can the shape be matched?

Any help will be appreciated.



Solution 1:[1]

The loss function you specify especially binary, mean, logarithms, and other than Adam are calculated on the shape as well, you may try to change the loss fn but the target to solve the issues is the make matching of input-output where I add one layer which makes bi-lstm possible with exists BERT model I load.

ValueError: A target array with shape (1500, 1) was passed for output of shape (None, 256, 1) while using as loss `binary_crossentropy`. This loss expects targets to have the same shape as the output.

[ Sample ]:

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Functions
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
# Build model
def build_model(max_seq_length): 
    in_id = tf.keras.layers.Input(shape=(max_seq_length,), name="input_ids")
    in_mask = tf.keras.layers.Input(shape=(max_seq_length,), name="input_masks")
    in_segment = tf.keras.layers.Input(shape=(max_seq_length,), name="segment_ids")
    bert_inputs = [in_id, in_mask, in_segment]
    
    options = tf.saved_model.LoadOptions(
    allow_partial_checkpoint=False,
    experimental_io_device="/physical_device:GPU:0",
    experimental_skip_checkpoint=True
    )
    text_input = tf.keras.layers.Input(shape=(), dtype=tf.string)
    preprocessor = hub.KerasLayer(export_dir)
    encoder = hub.KerasLayer( export_dir_2, trainable=False, load_options=options)
    encoder_inputs = preprocessor(text_input)
    outputs = encoder(encoder_inputs)

    reshape = tf.keras.layers.Reshape((512, 1))(outputs['default'])
    ### Add bi-lstm layer as requirements ###
    bilstm = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128, dropout=0.2,recurrent_dropout=0.2,return_sequences=True))(reshape)
    #########################################
    
    output = tf.keras.layers.Dense(1, activation="softmax")(bilstm)

    intermediate_layer = tf.keras.layers.Dense(512, activation='relu', name='intermediate_layer')(outputs['default'])
    output_layer = tf.keras.layers.Dense(1, activation='softmax', name='output_layer')(intermediate_layer)
    sentiment_model = tf.keras.Model(inputs=[text_input], outputs=output_layer)
    sentiment_model.summary()
    
    optim = tf.keras.optimizers.Adam(learning_rate=1e-5, decay=1e-6)
    loss_func = tf.keras.losses.CategoricalCrossentropy()
    acc = tf.keras.metrics.CategoricalAccuracy('accuracy')
    optim = tf.keras.optimizers.Adam(learning_rate=1e-5, decay=1e-6)
    loss_func = tf.keras.losses.CategoricalCrossentropy()
    acc = tf.keras.metrics.CategoricalAccuracy('accuracy')
    sentiment_model.compile(optimizer=optim, loss=loss_func, metrics=[acc])


    return sentiment_model

def initialize_vars():
with tf.compat.v1.Session() as sess:
    init = tf.compat.v1.global_variables_initializer()
    init.run()

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Model Initialize
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
sentiment_model = build_model(max_seq_length)

# Instantiate variables
initialize_vars()

hist = sentiment_model.fit(
    dataset,
    epochs=2
)

[ Output ]:

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to
==================================================================================================
 input_1 (InputLayer)           [(None,)]            0           []

 keras_layer (KerasLayer)       {'input_mask': (Non  0           ['input_1[0][0]']
                                e, 128),
                                 'input_word_ids':
                                (None, 128),
                                 'input_type_ids':
                                (None, 128)}

 keras_layer_1 (KerasLayer)     {'sequence_output':  28763649    ['keras_layer[0][0]',
                                 (None, 128, 512),                'keras_layer[0][1]',
                                 'default': (None,                'keras_layer[0][2]']
                                512),
                                 'pooled_output': (
                                None, 512),
                                 'encoder_outputs':
                                 [(None, 128, 512),
                                 (None, 128, 512),
                                 (None, 128, 512),
                                 (None, 128, 512)]}

 intermediate_layer (Dense)     (None, 512)          262656      ['keras_layer_1[0][0]']

 output_layer (Dense)           (None, 1)            513         ['intermediate_layer[0][0]']

==================================================================================================
Total params: 29,026,818
Trainable params: 263,169
Non-trainable params: 28,763,649
__________________________________________________________________________________________________
2022-04-04 17:12:28.901162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4634 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1
Epoch 1/2

Sample

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Martijn Pieters