'Sliding window input (image sequence) for convolutional neural network

I am currently trying to feed an image sequence as a single input entity to my CNN. I found the numpy utility numpy.lib.stride_tricks.sliding_window_view

My image data wrapper array has shape: (num_of_images, height, width, channels) and I would like to slice 5 images together resulting in a new single input array (5, height, width, channels), which would result in a wrapper array of shape (num_of_images/5, 5, height, width, channels). However, I struggle to use the sliding window view. Can someone enlighten me?

Bonus question: Each of the images has an associated label. I am unsure how to treat these labels when dealing with an image sequence.

Thank you in advance!

Solution 1:^[1]

It is just how much you collects from the sources

image_1 = plt.imread(list_pictures[0])
image_2 = plt.imread(list_pictures[1])
image = np.concatenate((image_1, image_2), axis=0)
image = np.reshape(image, (1920, 720, 4)) <<< confirm the input image shape
print(np.asarray(image).shape) # (1920, 720, 4)

shape = (64, 64, 4)
v = np.lib.stride_tricks.sliding_window_view(np.asarray(image), shape)  

(1920, 720, 4)
(1857, 657, 1, 64, 64, 4)

plt.imshow(np.reshape(v[:,:,:,0,0], (1857, 657, 4)))
plt.show()
plt.close()

input('...')

Solution 2:^[2]

I did it without the numpy utility like so:

im_pixels is an array containing n 1d-arrays with im_height*im_width entries. The 1 stems from 1 channel (greyscale).

def prep_images(im_pixels, window_size, im_height, im_width, pixel_normalizer):
    images = np.empty((len(im_pixels), window_size, im_height, im_width, 1))

    for i in range(len(im_pixels)):
        frame = im_pixels[i:i+window_size]
        im_frame = np.empty((window_size, im_height, im_width, 1))
        for j, image in enumerate(frame):
            frame[j] = normalize_pixels(image, pixel_normalizer)
            image_2d = np.reshape(frame[j], (im_height, im_width, 1))
            im_frame[j] = image_2d
        images[i] = im_frame
    return images

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Martijn Pieters
Solution 2	ABF

'Sliding window input (image sequence) for convolutional neural network

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]