'Loss not decreasing - Pytorch
I am using dice loss for my implementation of a Fully Convolutional Network(FCN) which involves hypernetworks. The model has two inputs and one output which is a binary segmentation map. The model is updating weights but loss is constant. It is not even overfitting on only three training examples
I have used other loss functions as well like dice+binarycrossentropy loss, jacard loss and MSE loss but the loss is almost constant. I have also tried almost every activation function like ReLU, LeakyReLU, Tanh. Moreover I have to use sigmoid at the the output because I need my outputs to be in range [0,1] Learning rate is 0.01. Moreover, I have tried different learning rates as well like 0.0001, 0.001, 0.1. And no matter what loss the training starts at, it always comes at this value
This shows gradients for three training examples. And overall loss
tensor(0.0010, device='cuda:0')
tensor(0.1377, device='cuda:0')
tensor(0.1582, device='cuda:0')
Epoch 9, Overall loss = 0.9604763123724196, mIOU=0.019766070265581623
tensor(0.0014, device='cuda:0')
tensor(0.0898, device='cuda:0')
tensor(0.0455, device='cuda:0')
Epoch 10, Overall loss = 0.9616242945194244, mIOU=0.01919178702228237
tensor(0.0886, device='cuda:0')
tensor(0.2561, device='cuda:0')
tensor(0.0108, device='cuda:0')
Epoch 11, Overall loss = 0.960331304506822, mIOU=0.01983801422510155
I expect the loss to converge in few epochs. What should I do?
Solution 1:[1]
@Muhammad Hamza Mughal
You got to add code of at least your forward
and train
functions for us to pinpoint the issue, @Jatentaki is right there could be so many things that could mess up a ML / DL code. Even I moved recently to pytorch from Keras, took some time to get used to it. But, here are the things I'd do:
1) As you're dealing with images, try to pre-process them a bit ( rotation, normalization, Gaussian Noise etc).
2) Zero gradients
of your optimizer at the beginning of each batch you fetch and also step optimizer
after you calculated loss and called loss.backward()
.
3) Add a weight decay
term to your optimizer call, typically L2, as you're dealing with Convolution networks have a decay term of 5e-4 or 5e-5.
4) Add a learning rate scheduler
to your optimizer, to change learning rates if there's no improvement over time.
We really can't include code in our answers. It's up to the practitioner to scout for how to implement all this stuff. Hope this helps.
Solution 2:[2]
It's not really a question for stack overflow. There's a million things which could be wrong and it's usually not possible to post enough code to allow us to pinpoint the issue, and even if it were, nobody could bother reading that much.
That being said, there are some general guidelines which often work for me.
- Try reducing the problem. If you replace your network with a single convolutional layer, will it converge? If yes, apparently something's wrong with your network
- Look at the data as you feed it as well as the labels (matplotlib plots, etc). Perhaps you're misaligning input with output (cropping issues, etc) or your data augmentation is way too strong.
- Look for, well..., bugs. Perhaps you're returning
torch.sigmoid(x)
from your network and then feeding it intotorch.nn.functional.binary_cross_entropy_with_logits
(effectively applyingsigmoid
twice). Maybe your last layer isReLU
and your network just cannot (by construction) output negative values where you would expect them.
Finally, I've personally never had much success training with dice as the primary loss function, so I would definitely try to get it working with cross entropy first, and then move on to dice.
Solution 3:[3]
@MuhammadHamzaMughal since you are using sigmoid to generate predictions, have you made sure that the target attributes in ground truth/training data/validation data are all in range [0-1] ?
Normalize the data with min-max normalization so that it is in [0-1] range.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Amith Adiraju |
Solution 2 | Jatentaki |
Solution 3 | Maria |