class mmpretrain.models.backbones.HiViT(arch='base', img_size=224, patch_size=16, inner_patches=4, in_chans=3, stem_mlp_ratio=3.0, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'type': 'LN'}, out_indices=[23], ape=True, rpe=False, patch_norm=True, frozen_stages=-1, kernel_size=None, pad_size=None, layer_scale_init_value=0.0, init_cfg=None)[source]


A PyTorch implement of: HiViT: A Simple and More Efficient Design of Hierarchical Vision Transformer.

  • arch (str | dict) –

    Swin Transformer architecture. If use string, choose from ‘tiny’, ‘small’, and’base’. If use dict, it should have below keys:

    • embed_dims (int): The dimensions of embedding.

    • depths (List[int]): The number of blocks in each stage.

    • num_heads (int): The number of heads in attention modules of each stage.

  • 'tiny'. (Defaults to) –

  • img_size (int) – Input image size.

  • patch_size (int) – Patch size. Defaults to 16.

  • inner_patches (int) – Inner patch. Defaults to 4.

  • in_chans (int) – Number of image input channels.

  • embed_dim (int) – Transformer embedding dimension.

  • depths (list[int]) – Number of successive HiViT blocks.

  • num_heads (int) – Number of attention heads.

  • stem_mlp_ratio (int) – Ratio of MLP hidden dim to embedding dim in the first two stages.

  • mlp_ratio (int) – Ratio of MLP hidden dim to embedding dim in the last stage.

  • qkv_bias (bool) – Enable bias for qkv projections if True.

  • qk_scale (float) – The number of divider after q@k. Default to None.

  • drop_rate (float) – Probability of an element to be zeroed after the feed forward layer. Defaults to 0.

  • attn_drop_rate (float) – The drop out rate for attention output weights. Defaults to 0.

  • drop_path_rate (float) – Stochastic depth rate. Defaults to 0.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN').

  • ape (bool) – If True, add absolute position embedding to the patch embedding.

  • rpe (bool) – If True, add relative position embedding to the patch embedding.

  • patch_norm (bool) – If True, use norm_cfg for normalization layer.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • kernel_size (int) – Kernel size.

  • pad_size (int) – Pad size.

  • layer_scale_init_value (float) – Layer-scale init values. Defaults to 0.

  • init_cfg (dict, optional) – The extra config for initialization. Defaults to None.

get_layer_depth(param_name, prefix='')[source]

Get the layer-wise depth of a parameter.

  • param_name (str) – The name of the parameter.

  • prefix (str) – The prefix for the parameter. Defaults to an empty string.


The layer-wise depth and the num of layers.

Return type:

Tuple[int, int]


The first depth is the stem module (layer_depth=0), and the last depth is the subsequent module (layer_depth=num_layers-1)

Read the Docs v: latest
On Read the Docs
Project Home

Free document hosting provided by Read the Docs.