Shortcuts

BEiTV2Neck

class mmpretrain.models.necks.BEiTV2Neck(num_layers=2, early_layers=9, backbone_arch='base', drop_rate=0.0, drop_path_rate=0.0, layer_scale_init_value=0.1, use_rel_pos_bias=False, norm_cfg={'eps': 1e-06, 'type': 'LN'}, init_cfg={'bias': 0, 'layer': 'Linear', 'std': 0.02, 'type': 'TruncNormal'})[source]

Neck for BEiTV2 Pre-training.

This module construct the decoder for the final prediction.

Parameters:
  • num_layers (int) – Number of encoder layers of neck. Defaults to 2.

  • early_layers (int) – The layer index of the early output from the backbone. Defaults to 9.

  • backbone_arch (str) – Vision Transformer architecture. Defaults to base.

  • drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.

  • drop_path_rate (float) – stochastic depth rate. Defaults to 0.

  • layer_scale_init_value (float) – The initialization value for the learnable scaling of attention and FFN. Defaults to 0.1.

  • use_rel_pos_bias (bool) – Whether to use unique relative position bias, if False, use shared relative position bias defined in backbone.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').

  • init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(inputs, rel_pos_bias, **kwargs)[source]

Get the latent prediction and final prediction.

Parameters:
Returns:

  • x: The final layer features from backbone, which are normed in BEiTV2Neck.

  • x_cls_pt: The early state features from backbone, which are consist of final layer cls_token and early state patch_tokens from backbone and sent to PatchAggregation layers in the neck.

Return type:

Tuple[torch.Tensor, torch.Tensor]

rescale_patch_aggregation_init_weight()[source]

Rescale the initialized weights.

Read the Docs v: latest
Versions
latest
stable
mmcls-1.x
mmcls-0.x
dev
Downloads
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.