Shortcuts

MobileViT

class mmpretrain.models.backbones.MobileViT(arch='small', in_channels=3, stem_channels=16, last_exp_factor=4, out_indices=(4,), frozen_stages=-1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'Swish'}, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[source]

MobileViT backbone.

A PyTorch implementation of : MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer

Modified from the official repo and timm.

Parameters:
  • arch (str | List[list]) –

    Architecture of MobileViT.

    • If a string, choose from “small”, “x_small” and “xx_small”.

    • If a list, every item should be also a list, and the first item of the sub-list can be chosen from “moblienetv2” and “mobilevit”, which indicates the type of this layer sequence. If “mobilenetv2”, the other items are the arguments of make_mobilenetv2_layer (except in_channels) and if “mobilevit”, the other items are the arguments of make_mobilevit_layer (except in_channels).

    Defaults to “small”.

  • in_channels (int) – Number of input image channels. Defaults to 3.

  • stem_channels (int) – Channels of stem layer. Defaults to 16.

  • last_exp_factor (int) – Channels expand factor of last layer. Defaults to 4.

  • out_indices (Sequence[int]) – Output from which stages. Defaults to (4, ).

  • frozen_stages (int) – Stages to be frozen (all param fixed). Defaults to -1, which means not freezing any parameters.

  • conv_cfg (dict, optional) – Config dict for convolution layer. Defaults to None, which means using conv2d.

  • norm_cfg (dict, optional) – Config dict for normalization layer. Defaults to dict(type=’BN’).

  • act_cfg (dict, optional) – Config dict for activation layer. Defaults to dict(type=’Swish’).

  • init_cfg (dict, optional) – Initialization config dict.

static make_mobilenetv2_layer(in_channels, out_channels, stride, num_blocks, expand_ratio=4)[source]

Build mobilenetv2 layer, which consists of several InvertedResidual layers.

Parameters:
  • in_channels (int) – The input channels.

  • out_channels (int) – The output channels.

  • stride (int) – The stride of the first 3x3 convolution in the InvertedResidual layers.

  • num_blocks (int) – The number of InvertedResidual blocks.

  • expand_ratio (int) – adjusts number of channels of the hidden layer in InvertedResidual by this amount. Defaults to 4.

static make_mobilevit_layer(in_channels, out_channels, stride, transformer_dim, ffn_dim, num_transformer_blocks, expand_ratio=4)[source]

Build mobilevit layer, which consists of one InvertedResidual and one MobileVitBlock.

Parameters:
  • in_channels (int) – The input channels.

  • out_channels (int) – The output channels.

  • stride (int) – The stride of the first 3x3 convolution in the InvertedResidual layers.

  • transformer_dim (int) – The channels of the transformer layers.

  • ffn_dim (int) – The mid-channels of the feedforward network in transformer layers.

  • num_transformer_blocks (int) – The number of transformer blocks.

  • expand_ratio (int) – adjusts number of channels of the hidden layer in InvertedResidual by this amount. Defaults to 4.

Read the Docs v: latest
Versions
latest
stable
mmcls-1.x
mmcls-0.x
dev
Downloads
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.