↧
Proper way to do gradient clipping?
No, loss.backward() calculates the gradient, clip_grad_norm_ limits it’s norm and optimizer.step() updates the parameters. But yes, you need the first and last. Best regards Thomas Read full topic
View ArticleProper way to do gradient clipping?
Does Variable.grad.data gives access to normalized gradients per batch? If yes, how can I have access to unnormalized gradients? Read full topic
View Article