'I want train a set of weight using pytorch, but the weights do not even change

I want to reproduce a method from a paper, the code in this paper was written in tensorflow1.0 and I want to rewrite it in pytorch. A brief description, I want to get a set of G that can be used to reweight input data but in training, the G doesn't even change, this is the tensorflow code:

    n,p = X_input.shape
    n_e, p_e = X_encoder_input.shape
    
    display_step = 100
    
    X = tf.placeholder("float", [None, p])
    X_encoder = tf.placeholder("float", [None, p_e])
    
    G = tf.Variable(tf.ones([n,1]))
    
    loss_balancing = tf.constant(0, tf.float32)
    for j in range(1,p+1):
        X_j = tf.slice(X_encoder, [j*n,0],[n,p_e])
        I = tf.slice(X, [0,j-1],[n,1])
        balancing_j = tf.divide(tf.matmul(tf.transpose(X_j),G*G*I),tf.maximum(tf.reduce_sum(G*G*I),tf.constant(0.1))) - tf.divide(tf.matmul(tf.transpose(X_j),G*G*(1-I)),tf.maximum(tf.reduce_sum(G*G*(1-I)),tf.constant(0.1)))
        loss_balancing += tf.norm(balancing_j,ord=2)
    loss_regulizer = (tf.reduce_sum(G*G)-n)**2 + 10*(tf.reduce_sum(G*G-1))**2#
    
    loss = loss_balancing + 0.0001*loss_regulizer
    
    optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(loss)
    
    saver = tf.train.Saver()
    sess = tf.Session()
    sess.run(tf.global_variables_initializer())

and this is my rewriting pytorch code:

n, p = x_test.shape
loss_balancing = torch.tensor(0.0)
G = nn.Parameter(torch.ones([n,1]))
optimizer = torch.optim.RMSprop([G] , lr=0.001)

for i in range(num_steps):
    
    for j in range(1, p+1):
        x_j = x_all_encoder[j * n : j*n + n , :]
        I = x_test[0:n , j-1:j]
        balancing_j = torch.divide(torch.matmul(torch.transpose(x_j,0,1) , G * G * I) , 
                                   torch.maximum( (G * G * I).sum() , 
                                                 torch.tensor(0.1) - 
                                                 torch.divide(torch.matmul(torch.transpose(x_j,0,1) ,G * G * (1-I)),
                                                              torch.maximum( (G*G*(1-I)).sum() , torch.tensor(0.1) ) 
                                                             )
                                                ) 
                                  )
        loss_balancing += nn.Parameter(torch.norm(balancing_j))
    
    loss_regulizer = nn.Parameter(((G * G) - n).sum() ** 2 + 10 * ((G * G - 1).sum()) ** 2)
    loss = nn.Parameter( loss_balancing + 0.0001 * loss_regulizer )
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    if i % 100 == 0:
        print('Loss:{:.4f}'.format(loss.item()))

and the G.grad = None, I want to know how to get the G a set of value by iteration to minimize the Loss , Thanks.



Solution 1:[1]

Firstly, please provide a minimal reproducible example. It will be very helpful for people to answer your question.

Since G.grad has no value, it indicates that loss.backward() didn't properly work.

The computation of gradient can be disturbed by many factors, but in this case, I suspect the maximum operation in your code prevents the backward flow since the maximum operation is not differentiable in general.

To check if this hypothesis is correct, you could check the gradient of a tensor created after the maximum operation which I can't do because provided code is not executable in my case.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Hayoung