MobileViT¶
- class mmpretrain.models.backbones.MobileViT(arch='small', in_channels=3, stem_channels=16, last_exp_factor=4, out_indices=(4,), frozen_stages=-1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'Swish'}, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]¶
MobileViT backbone.
A PyTorch implementation of : MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Modified from the official repo and timm.
- Parameters:
Architecture of MobileViT.
If a string, choose from “small”, “x_small” and “xx_small”.
If a list, every item should be also a list, and the first item of the sub-list can be chosen from “moblienetv2” and “mobilevit”, which indicates the type of this layer sequence. If “mobilenetv2”, the other items are the arguments of
make_mobilenetv2_layer
(exceptin_channels
) and if “mobilevit”, the other items are the arguments ofmake_mobilevit_layer
(exceptin_channels
).
Defaults to “small”.
in_channels (int) – Number of input image channels. Defaults to 3.
stem_channels (int) – Channels of stem layer. Defaults to 16.
last_exp_factor (int) – Channels expand factor of last layer. Defaults to 4.
out_indices (Sequence[int]) – Output from which stages. Defaults to (4, ).
frozen_stages (int) – Stages to be frozen (all param fixed). Defaults to -1, which means not freezing any parameters.
conv_cfg (dict, optional) – Config dict for convolution layer. Defaults to None, which means using conv2d.
norm_cfg (dict, optional) – Config dict for normalization layer. Defaults to dict(type=’BN’).
act_cfg (dict, optional) – Config dict for activation layer. Defaults to dict(type=’Swish’).
init_cfg (dict, optional) – Initialization config dict.
- static make_mobilenetv2_layer(in_channels, out_channels, stride, num_blocks, expand_ratio=4)[source]¶
Build mobilenetv2 layer, which consists of several InvertedResidual layers.
- Parameters:
in_channels (int) – The input channels.
out_channels (int) – The output channels.
stride (int) – The stride of the first 3x3 convolution in the
InvertedResidual
layers.num_blocks (int) – The number of
InvertedResidual
blocks.expand_ratio (int) – adjusts number of channels of the hidden layer in
InvertedResidual
by this amount. Defaults to 4.
- static make_mobilevit_layer(in_channels, out_channels, stride, transformer_dim, ffn_dim, num_transformer_blocks, expand_ratio=4)[source]¶
Build mobilevit layer, which consists of one InvertedResidual and one MobileVitBlock.
- Parameters:
in_channels (int) – The input channels.
out_channels (int) – The output channels.
stride (int) – The stride of the first 3x3 convolution in the
InvertedResidual
layers.transformer_dim (int) – The channels of the transformer layers.
ffn_dim (int) – The mid-channels of the feedforward network in transformer layers.
num_transformer_blocks (int) – The number of transformer blocks.
expand_ratio (int) – adjusts number of channels of the hidden layer in
InvertedResidual
by this amount. Defaults to 4.