I thought nn.utils.clip_grad_norm(model.parameters(), clip) is supposed to finish the job.
What is:
for p in model.parameters():
p.data.add_(-lr, p.grad.data)
for?
Can someone give a more explicit explain? Is it because after I use gradient clipping, I may not use adam optimizer?