Quantcast
Viewing all articles
Browse latest Browse all 22

Proper way to do gradient clipping?

I have tested in CPU and got no better results than just few milliseconds. (for someone who may try to implement LSTM for benchmarking Image may be NSFW.
Clik here to view.
:slight_smile:
) I think some more addition is insignificant than another expensive computations, like multiplication of weight matrices, nonlinear activation functions, or even python loop itself.

Read full topic


Viewing all articles
Browse latest Browse all 22

Trending Articles