'Can I use Layer Normalization with CNN?

I see the Layer Normalization is the modern normalization method than Batch Normalization, and it is very simple to coding in Tensorflow. But I think the layer normalization is designed for RNN, and the batch normalization for CNN. Can I use the layer normalization with CNN that process image classification task? What are the criteria for choosing batch normalization or layer?



Solution 1:[1]

You can use Layer normalisation in CNNs, but i don't think it more 'modern' than Batch Norm. They both normalise differently. Layer norm normalises all the activations of a single layer from a batch by collecting statistics from every unit within the layer, while batch norm normalises the whole batch for every single activation, where the statistics is collected for every single unit across the batch.

Batch norm is generally preferred over layer norm as it tries to normalise every activation to a unit gaussian distribution, while layer norm tries to get the 'average' of all activations to unit gaussian. But if the batch size is too small to collect reasonable statistics, then layer norm is preferred.

Solution 2:[2]

I would also like to add, as mentioned in original paper for Layer Norm, page 10 section 6.7, Layer Norm isn't advised to be used, and authors tell 'that more research has to be done' for CNNs

Also, a heads up - for RNN, Layer norm seems a better choice than Batch Norm, because training cases can be of different length in the same minibatch

Solution 3:[3]

In more recent work [1], it was found that you can use LayerNorm in CNNs without degrading accuracy, though it depends on the model architecture. Liu et al. [1] found while developing ConvNeXt that "Directly substituting LN for BN in the original ResNet will result in suboptimal performance" but they observed that their ConvNeXt "model does not have any difficulties training with LN; in fact, the performance is slightly better".

It would be great if there were a better explanation as to why...

  1. Liu et al. A ConvNet for the 2020s. https://arxiv.org/pdf/2201.03545.pdf

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 vijay m
Solution 2
Solution 3 hendryx