LARS¶
- class mmpretrain.engine.optimizers.LARS(params, lr, momentum=0, weight_decay=0, dampening=0, eta=0.001, nesterov=False, eps=1e-08)[source]¶
Implements layer-wise adaptive rate scaling for SGD.
Based on Algorithm 1 of the following paper by You, Gitman, and Ginsburg. Large Batch Training of Convolutional Networks:.
- Parameters:
params (Iterable) – Iterable of parameters to optimize or dicts defining parameter groups.
lr (float) – Base learning rate.
momentum (float) – Momentum factor. Defaults to 0.
weight_decay (float) – Weight decay (L2 penalty). Defaults to 0.
dampening (float) – Dampening for momentum. Defaults to 0.
eta (float) – LARS coefficient. Defaults to 0.001.
nesterov (bool) – Enables Nesterov momentum. Defaults to False.
eps (float) – A small number to avoid dviding zero. Defaults to 1e-8.
Example
>>> optimizer = LARS(model.parameters(), lr=0.1, momentum=0.9, >>> weight_decay=1e-4, eta=1e-3) >>> optimizer.zero_grad() >>> loss_fn(model(input), target).backward() >>> optimizer.step()