'Keras CNN, Incompatible shapes: [32,20,20,1] vs. [32,1]

I'm trying to reconstruct in Python the Gradient Transformation Network model in the paper titled : Single Image Super-Resolution Based on Deep Learning and Gradient Transformation by Chen et al 2016.

Here is the code I've written so far:

#Loading of data
trdata = ImageDataGenerator()
traindata = trdata.flow_from_directory(directory='train', target_size=(36,36))

tsdata = ImageDataGenerator()
testdata = tsdata.flow_from_directory(directory='test', target_size=(20,20))

I'm using the BSDS500 dataset to train the model, as used in the paper. The images are first run through Dong et al 2014's SRCNN model before each images X and Y gradients are extracted. The target size values are based on what was said in the paper as well as all the other parameters for the model. Input images are extracted X-gradient images. Each image is sliced into 36x36 blocks. Validation images are the same but sliced into 20x20 blocks. As written on the paper, the output of the model are 20x20 block images.

# Define GTN Model
GTN = Sequential()

# add model layers
GTN.add(Conv2D(filters=64, kernel_size=(9, 9), activation='relu', input_shape=(36, 36, 3)))
GTN.add(Conv2D(filters=32, kernel_size=(5, 5), activation='relu'))
GTN.add(Conv2D(filters=1, kernel_size=(5, 5), activation='relu'))
GTN.summary()

# define optimizer
sgd = SGD()
# compile model
GTN.compile(optimizer=sgd, loss='mean_squared_error', metrics=['MeanSquaredError'])

#model fitting
history = GTN.fit(traindata, validation_data=testdata, epochs = 10, steps_per_epoch=20)

Epochs value and steps are kept low for testing purposes. The `GTN.summary()' output:

Found 23400 images belonging to 1 classes.
Found 25200 images belonging to 1 classes.
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 28, 28, 64)        15616     
                                                                 
 conv2d_1 (Conv2D)           (None, 24, 24, 32)        51232     
                                                                 
 conv2d_2 (Conv2D)           (None, 20, 20, 1)         801       
                                                                 
=================================================================
Total params: 67,649
Trainable params: 67,649
Non-trainable params: 0
_________________________________________________________________

The error I get when running the code is:

Incompatible shapes: [32,20,20,1] vs. [32,1]

I have tried adding a Flatten() Layer but this makes the final layer be

flatten (Flatten) (None, 400) 0

It finishes 1 Epoch but then displays an error

Input to reshape is a tensor with 512 values, but the requested shape requires a multiple of 400

Do I have to manually reshape the input? Is it the way I loaded the traindata and testdata into the GTN.fit() line? I'm not sure about adding the Flatten() layer as the paper specifically described 3 Conv2D() layers.

Update:

I reverted to not slicing the input images into 36x36 blocks and the validation images into 20x20. I seem to have misunderstood the paper. But I still get the same error.

I then proceeded to add the GlobalAveragePooling2D and Dense layers as recommended by M.Innat. The model now runs but I worry that it's not accurate to the model. I did a run with 400 epochs and the MSE values for training and validation start very high and approaches near zero. Almost too good I think? The paper only states 10,000,000 iterations and did not explicitly state the number of epochs and steps per epoch.

400 epochs



Solution 1:[1]

you seem to misunderstand between validation images and training labels.

For a 36x36x3 input image, your model will produce a 20x20x1 output. Since you used MSE loss, the ground truth for each image should be in the same shape as the output. Because you specified the input shape (36x36x3) in the model definition, validation input images must be of that shape as well.

The data generator trdata.flow_from_directory produce a single integer label for each image, so that this pair of data cannot be used to train your model. A proper dataloader should produce a pair of data (36x36x3, 20x20x1). Please review your dataloader.

Solution 2:[2]

Based on your error logs and the loss function you used, you may need to modify your network as follows. Note, that your current model's output shape in 20, 20, 1 and your target label shape is 1, which are incompatible to calculate cost function. If the target shape is fine in your end goal, then you may need to change the output shape of your model (like below).

# Define GTN Model
GTN = Sequential()

# add model layers
GTN.add(Conv2D(filters=64, kernel_size=(9, 9), 
               activation='relu', input_shape=(36, 36, 3)))
GTN.add(Conv2D(filters=32, kernel_size=(5, 5), activation='relu'))
GTN.add(GlobalAveragePooling2D())
GTN.add(Dense(1, activation=None))
GTN.summary()
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d_18 (Conv2D)          (None, 28, 28, 64)        15616     
                                                                 
 conv2d_19 (Conv2D)          (None, 24, 24, 32)        51232     
                                                                 
 global_average_pooling2d_6   (None, 32)               0         
 (GlobalAveragePooling2D)                                        
                                                                 
 dense_2 (Dense)             (None, 1)                 33        
                                                                 
=================================================================

Otherwise, it's entirely possible that your model construction is fine but you may need to revisit the target shape, both output shapes should match.


Update

Based on the comments, here is another way to model. Note that, I used your constructed model as reference, that means, input_shpae=(36, 36, 3) and it's output_shape=(20, 20, 1). As a side note, in order to super-resolution an input image, I'm not sure what is thought behind choosing downsampling + grayscaling, as you did. I think you should consider that rigorously.

Imports

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.preprocessing import image_dataset_from_directory

import os, numpy as np

Get Dataset, BSDS500, same as yours.

dataset_url = "http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/BSR/BSR_bsds500.tgz"
data_dir = keras.utils.get_file(origin=dataset_url, fname="BSR", untar=True)
root_dir = os.path.join(data_dir, "BSDS500/data")

Make Training Set

I'm making a training set. But you may need to create a validation set as well. Also check this tutorials.

batch_size = 8
input_crop_size  = 36
input_mode = 'rgb'

train_ds = image_dataset_from_directory(
    root_dir,
    batch_size=batch_size,
    image_size=(input_crop_size, input_crop_size),
    color_mode=input_mode,
    label_mode=None,
)

Make Target Set

target_crop_size = 20
target_mode = 'grayscale'

def resize_target(sample):
    if target_mode == 'grayscale':
        sample = tf.image.rgb_to_grayscale(sample)

    sample = tf.image.resize(sample, 
                            [target_crop_size, target_crop_size])
    return sample

train_ds = train_ds.map(
    lambda x: (x, resize_target(x))
)
train_ds = train_ds.prefetch(buffer_size=32)

let's check and we will see that input and out shape are accordingly to your references. If you don't want that, you can easily change the above parameters to adjust.

x = next(iter(train_ds))
x[0].shape, x[1].shape
(TensorShape([8, 36, 36, 3]), TensorShape([8, 20, 20, 1]))

Your Model

if target_mode == 'grayscale':
    last_layer_channel = 1
else:
    last_layer_channel = 3

# Define GTN Model
GTN =  keras.Sequential()

# add model layers
GTN.add(layers.Conv2D(filters=64, kernel_size=(9, 9),
        activation='relu', input_shape=(36, 36, 3)))
GTN.add(layers.Conv2D(filters=32, kernel_size=(5, 5),
        activation='relu'))
GTN.add(layers.Conv2D(filters=last_layer_channel, 
        kernel_size=(5, 5), activation='relu'))
GTN.summary()
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d_6 (Conv2D)           (None, 28, 28, 64)        15616     
                                                                 
 conv2d_7 (Conv2D)           (None, 24, 24, 32)        51232     
                                                                 
 conv2d_8 (Conv2D)           (None, 20, 20, 1)         801       
                                                                 
=================================================================

Compile and Run

GTN.compile(optimizer='sgd', 
            loss='mean_squared_error', 
            metrics=['MeanSquaredError'])
GTN.fit(train_ds)
7s 93ms/step - loss: 398240.9375 - mean_squared_error: 398240.9062

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 cao-nv
Solution 2