class mmpretrain.models.backbones.DaViT(arch='t', patch_size=4, in_channels=3, window_size=7, ffn_ratio=4.0, qkv_bias=True, drop_path_rate=0.1, out_after_downsample=False, pad_small_map=False, norm_cfg={'type': 'LN'}, stage_cfgs={}, frozen_stages=-1, norm_eval=False, out_indices=(3,), with_cp=False, init_cfg=None)[source]


A PyTorch implement of : DaViT: Dual Attention Vision Transformers

Inspiration from

  • arch (str | dict) –

    DaViT architecture. If use string, choose from ‘tiny’, ‘small’, ‘base’ and ‘large’, ‘huge’, ‘giant’. If use dict, it should have below keys:

    • embed_dims (int): The dimensions of embedding.

    • depths (List[int]): The number of blocks in each stage.

    • num_heads (List[int]): The number of heads in attention modules of each stage.

    Defaults to ‘t’.

  • patch_size (int | tuple) – The patch size in patch embedding. Defaults to 4.

  • in_channels (int) – The num of input channels. Defaults to 3.

  • window_size (int) – The height and width of the window. Defaults to 7.

  • ffn_ratio (float) – The expansion ratio of feedforward network hidden layer channels. Defaults to 4.

  • qkv_bias (bool) – Whether to add bias for qkv in attention modules. Defaults to True.

  • drop_path_rate (float) – Stochastic depth rate. Defaults to 0.1.

  • out_after_downsample (bool) – Whether to output the feature map of a stage after the following downsample layer. Defaults to False.

  • pad_small_map (bool) – If True, pad the small feature map to the window size, which is common used in detection and segmentation. If False, avoid shifting window and shrink the window size to the size of feature map, which is common used in classification. Defaults to False.

  • norm_cfg (dict) – Config dict for normalization layer for all output features. Defaults to dict(type='LN')

  • stage_cfgs (Sequence[dict] | dict) – Extra config dict for each stage. Defaults to an empty dict.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defaults to False.

  • init_cfg (dict, optional) – The Config for initialization. Defaults to None.

Read the Docs v: latest
On Read the Docs
Project Home

Free document hosting provided by Read the Docs.