'Runtime error: CUDA out of memory by the end of training and doesn’t save model; pytorch

I'm not so experienced in Data Science and pytorch and I have problems with implementing at least anything here(currently I'm making a NN for segmentation tasks). There is some kind of memory problem, although it doesn't meen anything - every epoch takes a lot less memory than it is in the risen

import torch
from torch import nn
from torch.autograd import Variable
from torch.nn import Linear, ReLU6, CrossEntropyLoss, Sequential, Conv2d, MaxPool2d, Module, Softmax, Softplus ,BatchNorm2d, Dropout, ConvTranspose2d
import torch.nn.functional as F
from torch.nn import LeakyReLU,Tanh
from torch.optim import Adam, SGD
import numpy as np
import cv2 as cv
def train(epoch,model,criterion, x_train, y_train, loss_val):
    model.train()
    tr_loss = 0
    # getting the training set
    x_train, y_train = Variable(x_train), Variable(y_train)
    # converting the data into GPU format

    # clearing the Gradients of the model parameters
    optimizer.zero_grad()
    
    # prediction for training and validation set
    output_train = model(x_train)
    # computing the training and validation loss
    loss_train = criterion(output_train, y_train)
    train_losses.append(loss_train)
    # computing the updated weights of all the model parameters
    loss_train.backward()
    optimizer.step()
    tr_loss = loss_train.item()
    return loss_train
        # printing the validation loss
        
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 96, (3,3), padding=1)
        self.conv11= nn.Conv2d(96, 96, (3,3), padding=1)
        self.conv12= nn.Conv2d(96, 96, (3,3), padding=1)
        self.pool  = nn.MaxPool2d((2,2), 2)
        self.conv2 = nn.Conv2d(96, 192, (3,3), padding=1)
        self.conv21 = nn.Conv2d(192, 192, (3,3), padding=1)
        self.conv22 = nn.Conv2d(192, 192, (3,3), padding=1)
        self.b = BatchNorm2d(96)
        self.b1 = BatchNorm2d(192)
        self.b2 = BatchNorm2d(384)
        self.conv3 = nn.Conv2d(192,384,(3,3), padding=1)
        self.conv31= nn.Conv2d(384,384,(3,3), padding=1)
        self.conv32= nn.Conv2d(384,384,(3,3), padding=1)
        self.lin1   = nn.Linear(384*16*16, 256*2*2, 1)
        self.lin2   = nn.Linear(256*2*2, 16*16, 1)
        self.uppool = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=False)
        self.upconv1= nn.ConvTranspose2d(385,192,(3,3), padding=1)
        self.upconv11=nn.ConvTranspose2d(192,32,(3,3), padding=1)
        self.upconv12=nn.ConvTranspose2d(32,1,(3,3), padding=1)
        self.upconv2= nn.ConvTranspose2d(193,96,(3,3), padding=1)
        self.upconv21= nn.ConvTranspose2d(96,16,(3,3), padding=1)
        self.upconv22= nn.ConvTranspose2d(16,1,(3,3), padding=1)
        self.upconv3= nn.ConvTranspose2d(97,16,(3,3), padding=1)
        self.upconv4= nn.ConvTranspose2d(16,8,(3,3), padding=1)
        self.upconv6= nn.ConvTranspose2d(8,1,(3,3), padding=1)
    def forward(self, x):
        m=Tanh()
        x1=self.b(m(self.conv12(m(self.conv11(m(self.conv1(x)))))))
        x = self.pool(x1)
        x2=self.b1(m(self.conv22(m(self.conv21(m(self.conv2(x)))))))
        x = self.pool(x2)
        x3=self.b2(m(self.conv32(m(self.conv31(m(self.conv3(x)))))))
        x=self.pool(x3)
        x = x.view(-1, 16*16*384)
        x = m(self.lin1(x))
        x = m(self.lin2(x))
        x = x.view(1, 1, 16, 16)
        x=torch.cat((x,self.pool(x3)),1)
        x = self.uppool(m(self.upconv12(m(self.upconv11(m(self.upconv1(x)))))))
        
        x=torch.cat((x,self.pool(x2)),1)
        x = self.uppool(m(self.upconv22(m(self.upconv21(m(self.upconv2(x)))))))
        
        x=torch.cat((x,self.pool(x1)),1)
        x = (self.uppool(m(self.upconv3(x))))
        x = (m(self.upconv4(x)))
        l=Softplus()
        x= l(self.upconv6(x))
        return x
train_data=[]
for path in range(1000):
    n="".join(["0" for i in range(5-len(str(path)))])+str(path)
    paths="00000\\"+n+".png"
    train_data.append(cv.imread(paths))
for path in range(2000,3000):
    n="".join(["0" for i in range(5-len(str(path)))])+str(path)
    paths="02000\\"+n+".png"
    train_data.append(cv.imread(paths))
train_output=[]
for path in range(1,2001):
    n="outputs\\"+str(path)+".jpg"
    train_output.append(cv.imread(n))
data=torch.from_numpy((np.array(train_data,dtype=float).reshape(2000,3,128,128)/255)).reshape(2000,3,128,128)
data_cuda=torch.tensor(data.to('cuda'), dtype=torch.float32)

output=torch.from_numpy(np.array(train_output,dtype=float).reshape(2000,3,128,128))[:,2].view(2000,1,128,128)*2
output_cuda=torch.tensor(output.to('cuda'),dtype=torch.float32)
model=Net()
optimizer = Adam(model.parameters(), lr=0.1)
criterion = nn.BCEWithLogitsLoss()
if torch.cuda.is_available():
    model = model.cuda()
    criterion = criterion.cuda()
print(model)
epochs=3
n_epochs = 1
train_losses = []
val_losses = []
for epoch in range(n_epochs):
    loss_train=0
    for i in range(data.shape[0]):
        loss_train1=train(epoch,model,criterion,data_cuda[i].reshape(1,3,128,128),output_cuda[i].reshape(1,1,128,128),train_losses)
        loss_train+=loss_train1
    print('Epoch : ',epoch+1, '\t', 'loss :', loss_train/data.shape[0])
with torch.no_grad():
    torch.save(model.state_dict(), "C:\\Users\\jugof\\Desktop\\Python\\pytorch_models")
    a=np.array(model(data_cuda).to('cpu').numpy())*255
    cv.imshow('',a.reshape(128,128))
    cv.waitKey(0)"""

Here is the error:

PS C:\Users\jugof\Desktop\Python> & C:/Users/jugof/anaconda3/python.exe c:/Users/jugof/Desktop/Python/3d_visual_effect1.py c:/Users/jugof/Desktop/Python/3d_visual_effect1.py:98: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). data_cuda=torch.tensor(data.to('cuda'), dtype=torch.float32) c:/Users/jugof/Desktop/Python/3d_visual_effect1.py:101: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). output_cuda=torch.tensor(output.to('cuda'),dtype=torch.float32) Epoch : 1 loss : tensor(0.6933, device='cuda:0', grad_fn=) Traceback (most recent call last): File "c:/Users/jugof/Desktop/Python/3d_visual_effect1.py", line 120, in a=np.array(model(data_cuda).to('cpu').numpy())*255 File "C:\Users\jugof\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "c:/Users/jugof/Desktop/Python/3d_visual_effect1.py", line 62, in forward x1=self.b(m(self.conv12(m(self.conv11(m(self.conv1(x))))))) File "C:\Users\jugof\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl result = self.forward(input, **kwargs) File "C:\Users\jugof\anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 399, in forward return self._conv_forward(input, self.weight, self.bias) File "C:\Users\jugof\anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 395, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: CUDA out of memory. Tried to allocate 11.72 GiB (GPU 0; 6.00 GiB total capacity; 2.07 GiB already allocated; 1.55 GiB free; 2.62 GiB reserved in total by PyTorch) I feed a numpy array (an image) of 128128 shape and recieve another of the same shape, it's a segmentation model(again)

I was using Flickr-Faces-HQ Dataset (FFHQ) and used downsampled 128*128 labels - I used 00000, 01000 and 02000 files and masks were recieved by opencv haarscascades_eye

python pytorch

Solution 1:^[1]

The problem is your loss_train list, which stores all losses from the beginning of your experiment. If the losses you put in were mere float, that would not be an issue, but because of your not returning a float in the train function, you are actually storing loss tensors, with all the computational graph embedded in them. Indeed, a tensor keeps pointers of all tensors that were involved in its computation, and as long as a pointer exist, the allocated memory cannot be freed.

So basically, you keep all tensors from all epochs and prevent pytorch from cleaning them ; it's like a (deliberate) memory leak

You can very easily monitor this type of issue by running nvidia-smi -l 1 after having started your experiment. You will watch your memory usage grow linearly until your GPU runs out of memory (`nvidia-smi is a good tool to use when doing stuff on your GPU).

To prevent this from happening, simply replace the last line of the train function with return loss_train.item(), and the memory issue will vanish

Solution 2:^[2]

I have encountered the same problem and I solved it by reducing the batch_size !

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Dharman
Solution 2	El Mehdi Tafik

'Runtime error: CUDA out of memory by the end of training and doesn’t save model; pytorch

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]