BEiTV2Neck¶
- class mmpretrain.models.necks.BEiTV2Neck(num_layers=2, early_layers=9, backbone_arch='base', drop_rate=0.0, drop_path_rate=0.0, layer_scale_init_value=0.1, use_rel_pos_bias=False, norm_cfg={'eps': 1e-06, 'type': 'LN'}, init_cfg={'bias': 0, 'layer': 'Linear', 'std': 0.02, 'type': 'TruncNormal'})[source]¶
Neck for BEiTV2 Pre-training.
This module construct the decoder for the final prediction.
- Parameters:
num_layers (int) – Number of encoder layers of neck. Defaults to 2.
early_layers (int) – The layer index of the early output from the backbone. Defaults to 9.
backbone_arch (str) – Vision Transformer architecture. Defaults to base.
drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.
drop_path_rate (float) – stochastic depth rate. Defaults to 0.
layer_scale_init_value (float) – The initialization value for the learnable scaling of attention and FFN. Defaults to 0.1.
use_rel_pos_bias (bool) – Whether to use unique relative position bias, if False, use shared relative position bias defined in backbone.
norm_cfg (dict) – Config dict for normalization layer. Defaults to
dict(type='LN')
.init_cfg (dict, optional) – Initialization config dict. Defaults to None.
- forward(inputs, rel_pos_bias, **kwargs)[source]¶
Get the latent prediction and final prediction.
- Parameters:
x (Tuple[torch.Tensor]) – Features of tokens.
rel_pos_bias (torch.Tensor) – Shared relative position bias table.
- Returns:
x
: The final layer features from backbone, which are normed inBEiTV2Neck
.x_cls_pt
: The early state features from backbone, which are consist of final layer cls_token and early state patch_tokens from backbone and sent to PatchAggregation layers in the neck.
- Return type:
Tuple[torch.Tensor, torch.Tensor]