

class mmpretrain.models.backbones.EfficientFormer(arch='l1', in_channels=3, pool_size=3, mlp_ratios=4, reshape_last_feat=False, out_indices=-1, frozen_stages=-1, act_cfg={'type': 'GELU'}, drop_rate=0.0, drop_path_rate=0.0, use_layer_scale=True, init_cfg=None)[源代码]


A PyTorch implementation of EfficientFormer introduced by: EfficientFormer: Vision Transformers at MobileNet Speed

Modified from the official repo <>.

  • arch (str | dict) –

    The model’s architecture. If string, it should be one of architecture in EfficientFormer.arch_settings. And if dict, it should include the following 4 keys:

    • layers (list[int]): Number of blocks at each stage.

    • embed_dims (list[int]): The number of channels at each stage.

    • downsamples (list[int]): Has downsample or not in the four stages.

    • vit_num (int): The num of vit blocks in the last stage.

    Defaults to ‘l1’.

  • in_channels (int) – The num of input channels. Defaults to 3.

  • pool_size (int) – The pooling size of Meta4D blocks. Defaults to 3.

  • mlp_ratios (int) – The dimension ratio of multi-head attention mechanism in Meta4D blocks. Defaults to 3.

  • reshape_last_feat (bool) – Whether to reshape the feature map from (B, N, C) to (B, C, H, W) in the last stage, when the vit-num in arch is not 0. Defaults to False. Usually set to True in downstream tasks.

  • out_indices (Sequence[int]) – Output from which stages. Defaults to -1.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • act_cfg (dict) – The config dict for activation between pointwise convolution. Defaults to dict(type='GELU').

  • drop_rate (float) – Dropout rate. Defaults to 0.

  • drop_path_rate (float) – Stochastic depth rate. Defaults to 0.

  • use_layer_scale (bool) – Whether to use use_layer_scale in MetaFormer block. Defaults to True.

  • init_cfg (dict, optional) – Initialization config dict. Defaults to None.


>>> from mmpretrain.models import EfficientFormer
>>> import torch
>>> inputs = torch.rand((1, 3, 224, 224))
>>> # build EfficientFormer backbone for classification task
>>> model = EfficientFormer(arch="l1")
>>> model.eval()
>>> level_outputs = model(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 448, 49)
>>> # build EfficientFormer backbone for downstream task
>>> model = EfficientFormer(
>>>    arch="l3",
>>>    out_indices=(0, 1, 2, 3),
>>>    reshape_last_feat=True)
>>> model.eval()
>>> level_outputs = model(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 64, 56, 56)
(1, 128, 28, 28)
(1, 320, 14, 14)
(1, 512, 7, 7)
Read the Docs v: latest
On Read the Docs
Project Home

Free document hosting provided by Read the Docs.