Proper way to do gradient clipping?

@ntubertchen
Hi,
Use torch.nn.utils.clip_grad_norm to keep the gradients within a specific range (clip). In RNNs the gradients tend to grow very large (this is called ‘the exploding gradient problem’), and clipping them helps to prevent this from happening . It is probably helpful to look at the implementation because it teaches us that:

“The norm is computed over all gradients together, as if they were concatenated into a single vector.”
You can control the norm type (lp-norm, with p defaulting to 2; or the L-inf norm).
All of the gradient coefficients are multiplied by the same clip_coef.
clip_grad_norm is invoked after all of the gradients have been updated. I.e. between loss.backward() and optimizer.step(). So during loss.backward(), the gradients that are propagated backwards are not clipped, until the backward pass completes and clip_grad_norm() is invoked. optimizer.step() will then use the updated gradients.

Regarding the code you ask about:

for p in model.parameters():
    p.data.add_(-lr, p.grad.data)

This iterates across all of the model.parameters() and performs an in-place multiply-add on each of the parameter tensors §.
p.data.add_ is functionally equal to:

p.data = p.data + (-lr * p.grad.data)

In other words, this performs a similar function as optimizer.step(), using the gradients to updates the model parameters, but without the extra sophistication of a torch.optim.Optimizer. If you use the above code, then you should not use an optimizer (and vice-versa).

Cheers,
Neta

Read full topic

Proper way to do gradient clipping?

Trending Articles

Black Angus Grilled Artichokes

SUPER JUNIOR-L.S.S. – Pon Pon – Single [iTunes Plus M4A]

Bureau of Internal Revenue: Regional Offices (Directory)

Isle of Man property sales, October 23, 2014

VMOU RSCIT Result 2017, RSCIT Result VMOU rkcl.vmou.ac.in Name Wise

‘I did it for my girl’

Halestorm – Everest – Pre-Single [iTunes Plus M4A]

The Crack Era In Philadelphia Revisited: Philly Junior Black Mafia Timeline...

Saint Clair Area agreement would reimburse Pottsville Area athletic programs

99 God Status for Whatsapp, Facebook

How to set Page Break/ Page Reset in Smartforms using Events

JORGE ANDRADE Arrested by Miami-Dade County Corrections on Oct 24, 2016

Practice Sheet of Right form of verbs for HSC Students

Motrex mtxm100ja

Best service Chris Hein Chromatic Harmonica v.1.0.KONTAKT-MAGNETRiXX

Adding custom UIBB's into Standard WDA FPM application

Hucknall burglar's attempt to break into woman's home foiled

Sunplus 1506F Wifi 3G Mini HD DVB-S2 digital satellite receiver

最も古い共有フォルダーのシャドウコピーが削除されてしまう事象について

半角で送信した文字が受信側で全角になる