Shortcuts

MaskFeatViT

class mmpretrain.models.selfsup.MaskFeatViT(arch='b', img_size=224, patch_size=16, out_indices=-1, drop_rate=0, drop_path_rate=0, norm_cfg={'eps': 1e-06, 'type': 'LN'}, final_norm=True, out_type='raw', interpolate_mode='bicubic', patch_cfg={}, layer_cfgs={}, init_cfg=None)[源代码]

Vision Transformer for MaskFeat pre-training.

A PyTorch implement of: Masked Feature Prediction for Self-Supervised Visual Pre-Training.

参数:
  • arch (str | dict) – Vision Transformer architecture Default: ‘b’

  • img_size (int | tuple) – Input image size

  • patch_size (int | tuple) – The patch size

  • out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.

  • drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.

  • drop_path_rate (float) – stochastic depth rate. Defaults to 0.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').

  • final_norm (bool) – Whether to add a additional layer to normalize final feature map. Defaults to True.

  • out_type (str) –

    The type of output features. Please choose from

    • "cls_token": The class token tensor with shape (B, C).

    • "featmap": The feature map tensor from the patch tokens with shape (B, C, H, W).

    • "avg_featmap": The global averaged feature map tensor with shape (B, C).

    • "raw": The raw feature tensor includes patch tokens and class tokens with shape (B, L, C).

    It only works without input mask. Defaults to "avg_featmap".

  • interpolate_mode (str) – Select the interpolate mode for position embeding vector resize. Defaults to “bicubic”.

  • patch_cfg (dict) – Configs of patch embeding. Defaults to an empty dict.

  • layer_cfgs (Sequence | dict) – Configs of each transformer layer in encoder. Defaults to an empty dict.

  • init_cfg (dict, optional) – Initialization config dict. Defaults to None.

forward(x, mask)[源代码]

Generate features for masked images.

The function supports two kind of forward behaviors. If the mask is not None, the forward function will be executed as masked image modeling pre-training; if the mask is None, the forward function will call super().forward(), which extract features from images without mask.

参数:
返回:

Features with cls_tokens.

返回类型:

torch.Tensor

init_weights()[源代码]

Initialize position embedding, mask token and cls token.

Read the Docs v: latest
Versions
latest
stable
mmcls-1.x
mmcls-0.x
dev
Downloads
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.