Shortcuts

ViTEVA02

class mmpretrain.models.backbones.ViTEVA02(arch='tiny', sub_ln=False, drop_rate=0.0, attn_drop_rate=0.0, proj_drop_rate=0.0, drop_path_rate=0.0, qkv_bias=True, norm_cfg={'type': 'LN'}, with_cls_token=True, layer_cfgs={}, **kwargs)[source]

EVA02 Vision Transformer.

A PyTorch implement of : EVA-02: A Visual Representation for Neon Genesis

Parameters:
  • arch (str | dict) –

    Vision Transformer architecture. If use string, choose from ‘tiny’, ‘small’, ‘base’, ‘large’. If use dict, it should have below keys:

    • embed_dims (int): The dimensions of embedding.

    • num_layers (int): The number of transformer encoder layers.

    • num_heads (int): The number of heads in attention modules.

    • mlp_ratio (float): The ratio of the mlp module.

    Defaults to ‘tiny’.

  • sub_ln (bool) – Whether to add the sub layer normalization in swiglu. Defaults to False.

  • drop_rate (float) – Probability of an element to be zeroed in the mlp module. Defaults to 0.

  • attn_drop_rate (float) – Probability of an element to be zeroed after the softmax in the attention. Defaults to 0.

  • proj_drop_rate (float) – Probability of an element to be zeroed after projection in the attention. Defaults to 0.

  • drop_path_rate (float) – stochastic depth rate. Defaults to 0.

  • qkv_bias (bool) – Whether to add bias for qkv in attention modules. Defaults to True.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').

  • with_cls_token (bool) – Whether concatenating class token into image tokens as transformer input. Defaults to True.

  • layer_cfgs (Sequence | dict) – Configs of each transformer layer in encoder. Defaults to an empty dict.

  • **kwargs (dict, optional) – Other args for Vision Transformer.

Read the Docs v: latest
Versions
latest
stable
mmcls-1.x
mmcls-0.x
dev
Downloads
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.