Shortcuts

Lamb

class mmpretrain.engine.optimizers.Lamb(params, lr=0.001, bias_correction=True, betas=(0.9, 0.999), eps=1e-06, weight_decay=0.01, grad_averaging=True, max_grad_norm=1.0, trust_clip=False, always_adapt=False)[source]

A pure pytorch variant of FuseLAMB (NvLamb variant) optimizer.

This class is copied from timm. The LAMB was proposed in Large Batch Optimization for Deep Learning - Training BERT in 76 minutes.

Parameters:
  • params (iterable) – iterable of parameters to optimize or dicts defining

  • groups. (parameter) –

  • lr (float, optional) – learning rate. (default: 1e-3)

  • betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its norm. (default: (0.9, 0.999))

  • eps (float, optional) – term added to the denominator to improve numerical stability. (default: 1e-8)

  • weight_decay (float, optional) – weight decay (L2 penalty) (default: 0)

  • grad_averaging (bool, optional) – whether apply (1-beta2) to grad when calculating running averages of gradient. (default: True)

  • max_grad_norm (float, optional) – value used to clip global grad norm (default: 1.0)

  • trust_clip (bool) – enable LAMBC trust ratio clipping (default: False)

  • always_adapt (boolean, optional) – Apply adaptive learning rate to 0.0 weight decay parameter (default: False)

step(closure=None)[source]

Performs a single optimization step.

Parameters:

closure (callable, optional) – A closure that reevaluates the model and returns the loss.