LeViT¶
- class mmpretrain.models.backbones.LeViT(arch, img_size=224, patch_size=16, attn_ratio=2, mlp_ratio=2, act_cfg={'type': 'HSwish'}, hybrid_backbone=<class 'mmpretrain.models.backbones.levit.HybridBackbone'>, out_indices=-1, deploy=False, drop_path_rate=0, init_cfg=None)[source]¶
LeViT backbone.
A PyTorch implementation of LeViT: A Vision Transformer in ConvNet’s Clothing for Faster Inference
Modified from the official implementation: https://github.com/facebookresearch/LeViT
- Parameters:
LeViT architecture.
If use string, choose from ‘128s’, ‘128’, ‘192’, ‘256’ and ‘384’. If use dict, it should have below keys:
embed_dims (List[int]): The embed dimensions of each stage.
key_dims (List[int]): The embed dimensions of the key in the attention layers of each stage.
num_heads (List[int]): The number of heads in each stage.
depths (List[int]): The number of blocks in each stage.
img_size (int) – Input image size
attn_ratio (int) – Ratio of hidden dimensions of the value in attention layers. Defaults to 2.
mlp_ratio (int) – Ratio of hidden dimensions in MLP layers. Defaults to 2.
act_cfg (dict) – The config of activation functions. Defaults to
dict(type='HSwish')
.hybrid_backbone (callable) – A callable object to build the patch embed module. Defaults to use
HybridBackbone
.out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.
deploy (bool) – Whether to switch the model structure to deployment mode. Defaults to False.
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.