Quick question about this @apaszke , are the Variable.grad.data
that we should pass to our clip function a part of the model object or (if we use a different optimizer) - the optimizer object?
In the sense, does optimizer itself call backward? In which case, the code below should pass optimizer to the clip function right?
optimizer.zero_grad()
output, hidden = model(data, hidden)
loss = criterion(output.view(-1, ntokens), targets)
loss.backward()
clipped_lr = lr * clip_gradient(model, clip)
for p in model.parameters():
p.data.add_(-clipped_lr, p.grad.data)
optimizer.step()