You need to use both optimizer.step
and clip right? Because optimizer.step
calcultes the gradient and then you want to clip those gradients to prevent vanishing on the next training step?
↧
Proper way to do gradient clipping?
↧