EfficientFormer¶

class mmpretrain.models.backbones.EfficientFormer(arch='l1', in_channels=3, pool_size=3, mlp_ratios=4, reshape_last_feat=False, out_indices=-1, frozen_stages=-1, act_cfg={'type': 'GELU'}, drop_rate=0.0, drop_path_rate=0.0, use_layer_scale=True, init_cfg=None)[源代码]¶

EfficientFormer.

A PyTorch implementation of EfficientFormer introduced by: EfficientFormer: Vision Transformers at MobileNet Speed

Modified from the official repo <https://github.com/snap-research/EfficientFormer>.

参数:

arch (str | dict) –
The model’s architecture. If string, it should be one of architecture in EfficientFormer.arch_settings. And if dict, it should include the following 4 keys:
- layers (list[int]): Number of blocks at each stage.
- embed_dims (list[int]): The number of channels at each stage.
- downsamples (list[int]): Has downsample or not in the four stages.
- vit_num (int): The num of vit blocks in the last stage.
Defaults to ‘l1’.
in_channels (int) – The num of input channels. Defaults to 3.
pool_size (int) – The pooling size of Meta4D blocks. Defaults to 3.
mlp_ratios (int) – The dimension ratio of multi-head attention mechanism in Meta4D blocks. Defaults to 3.
reshape_last_feat (bool) – Whether to reshape the feature map from (B, N, C) to (B, C, H, W) in the last stage, when the vit-num in arch is not 0. Defaults to False. Usually set to True in downstream tasks.
out_indices (Sequence[int]) – Output from which stages. Defaults to -1.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
act_cfg (dict) – The config dict for activation between pointwise convolution. Defaults to dict(type='GELU').
drop_rate (float) – Dropout rate. Defaults to 0.
drop_path_rate (float) – Stochastic depth rate. Defaults to 0.
use_layer_scale (bool) – Whether to use use_layer_scale in MetaFormer block. Defaults to True.
init_cfg (dict, optional) – Initialization config dict. Defaults to None.

示例

>>> from mmpretrain.models import EfficientFormer
>>> import torch
>>> inputs = torch.rand((1, 3, 224, 224))
>>> # build EfficientFormer backbone for classification task
>>> model = EfficientFormer(arch="l1")
>>> model.eval()
>>> level_outputs = model(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 448, 49)
>>> # build EfficientFormer backbone for downstream task
>>> model = EfficientFormer(
>>>    arch="l3",
>>>    out_indices=(0, 1, 2, 3),
>>>    reshape_last_feat=True)
>>> model.eval()
>>> level_outputs = model(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 64, 56, 56)
(1, 128, 28, 28)
(1, 320, 14, 14)
(1, 512, 7, 7)