Quantcast
Channel: Proper way to do gradient clipping?
Viewing all articles
Browse latest Browse all 22

Proper way to do gradient clipping?

$
0
0

Is there a proper way to do gradient clipping, for example, with Adam? It seems like that the value of Variable.data.grad should be manipulated (clipped) before calling optimizer.step() method. I think the value of Variable.data.grad can be modified in-place to do gradient clipping. Is it safe to do?

Also, Is there a reason that Autograd RNN cells have separated biases for input-to-hidden and hidden-to-hidden? I think this is redundant and has a some overhead.

Read full topic


Viewing all articles
Browse latest Browse all 22

Trending Articles