LearningRateDecayOptimWrapperConstructor¶
- class mmpretrain.engine.optimizers.LearningRateDecayOptimWrapperConstructor(optim_wrapper_cfg, paramwise_cfg=None)[源代码]¶
Different learning rates are set for different layers of backbone.
By default, each parameter share the same optimizer settings, and we provide an argument
paramwise_cfg
to specify parameter-wise settings. It is a dict and may contain the following fields:layer_decay_rate
(float): The learning rate of a parameter will multiply it by multiple times according to the layer depth of the parameter. Usually, it’s less than 1, so that the earlier layers will have a lower learning rate. Defaults to 1.bias_decay_mult
(float): It will be multiplied to the weight decay for all bias parameters (except for those in normalization layers).norm_decay_mult
(float): It will be multiplied to the weight decay for all weight and bias parameters of normalization layers.flat_decay_mult
(float): It will be multiplied to the weight decay for all one-dimensional parameterscustom_keys
(dict): Specified parameters-wise settings by keys. If one of the keys incustom_keys
is a substring of the name of one parameter, then the setting of the parameter will be specified bycustom_keys[key]
and other setting likebias_decay_mult
will be ignored. It should be a dict and may contain fieldsdecay_mult
. (Thelr_mult
is disabled in this constructor).
Example:
In the config file, you can use this constructor as below:
optim_wrapper = dict( optimizer=dict( type='AdamW', lr=4e-3, weight_decay=0.05, eps=1e-8, betas=(0.9, 0.999)), constructor='LearningRateDecayOptimWrapperConstructor', paramwise_cfg=dict( layer_decay_rate=0.75, # layer-wise lr decay factor norm_decay_mult=0., flat_decay_mult=0., custom_keys={ '.cls_token': dict(decay_mult=0.0), '.pos_embed': dict(decay_mult=0.0) }))