The one comes with nn.util clips in proportional to the magnitude of the gradients. Thus you’d like to make sure it is not too small for your particular model as Adam said (I think :p). The old-fashioned way of clipping/clampping is
def gradClamp(parameters, clip=5):
for p in parameters:
p.grad.data.clamp_(max=clip)