LeViT¶

class mmpretrain.models.backbones.LeViT(arch, img_size=224, patch_size=16, attn_ratio=2, mlp_ratio=2, act_cfg={'type': 'HSwish'}, hybrid_backbone=<class 'mmpretrain.models.backbones.levit.HybridBackbone'>, out_indices=-1, deploy=False, drop_path_rate=0, init_cfg=None)[source]¶

LeViT backbone.

A PyTorch implementation of LeViT: A Vision Transformer in ConvNet’s Clothing for Faster Inference

Modified from the official implementation: https://github.com/facebookresearch/LeViT

Parameters:

arch (str | dict) –
LeViT architecture.

If use string, choose from ‘128s’, ‘128’, ‘192’, ‘256’ and ‘384’. If use dict, it should have below keys:
- embed_dims (List[int]): The embed dimensions of each stage.
- key_dims (List[int]): The embed dimensions of the key in the attention layers of each stage.
- num_heads (List[int]): The number of heads in each stage.
- depths (List[int]): The number of blocks in each stage.
img_size (int) – Input image size
patch_size (int | tuple) – The patch size. Deault to 16
attn_ratio (int) – Ratio of hidden dimensions of the value in attention layers. Defaults to 2.
mlp_ratio (int) – Ratio of hidden dimensions in MLP layers. Defaults to 2.
act_cfg (dict) – The config of activation functions. Defaults to dict(type='HSwish').
hybrid_backbone (callable) – A callable object to build the patch embed module. Defaults to use HybridBackbone.
out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.
deploy (bool) – Whether to switch the model structure to deployment mode. Defaults to False.
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.