EdgeNeXt¶

class mmpretrain.models.backbones.EdgeNeXt(arch='xxsmall', in_channels=3, global_blocks=[0, 1, 1, 1], global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'], drop_path_rate=0.0, layer_scale_init_value=1e-06, linear_pw_conv=True, mlp_ratio=4, conv_kernel_sizes=[3, 5, 7, 9], use_pos_embd_csa=[False, True, False, False], use_pos_embd_global=False, d2_scales=[2, 2, 3, 4], norm_cfg={'eps': 1e-06, 'type': 'LN2d'}, out_indices=-1, frozen_stages=0, gap_before_final_norm=True, act_cfg={'type': 'GELU'}, init_cfg=None)[source]¶

EdgeNeXt.

A PyTorch implementation of: EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications

Inspiration from https://github.com/mmaaz60/EdgeNeXt

Parameters:

arch (str | dict) –
The model’s architecture. If string, it should be one of architectures in EdgeNeXt.arch_settings. And if dict, it should include the following keys:
- channels (list[int]): The number of channels at each stage.
- depths (list[int]): The number of blocks at each stage.
- num_heads (list[int]): The number of heads at each stage.
Defaults to ‘xxsmall’.
in_channels (int) – The number of input channels. Defaults to 3.
global_blocks (list[int]) – The number of global blocks. Defaults to [0, 1, 1, 1].
global_block_type (list[str]) – The type of global blocks. Defaults to [‘None’, ‘SDTA’, ‘SDTA’, ‘SDTA’].
drop_path_rate (float) – Stochastic depth dropout rate. Defaults to 0.
layer_scale_init_value (float) – Initial value of layer scale. Defaults to 1e-6.
linear_pw_conv (bool) – Whether to use linear layer to do pointwise convolution. Defaults to False.
mlp_ratio (int) – The number of channel ratio in MLP layers. Defaults to 4.
conv_kernel_size (list[int]) – The kernel size of convolutional layers at each stage. Defaults to [3, 5, 7, 9].
use_pos_embd_csa (list[bool]) – Whether to use positional embedding in Channel Self-Attention. Defaults to [False, True, False, False].
use_pos_emebd_global (bool) – Whether to use positional embedding for whole network. Defaults to False.
d2_scales (list[int]) – The number of channel groups used for SDTA at each stage. Defaults to [2, 2, 3, 4].
norm_cfg (dict) – The config of normalization layer. Defaults to dict(type='LN2d', eps=1e-6).
out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.
frozen_stages (int) – Stages to be frozen (all param fixed). Defaults to 0, which means not freezing any parameters.
gap_before_final_norm (bool) – Whether to globally average the feature map before the final norm layer. Defaults to True.
act_cfg (dict) – The config of activation layer. Defaults to dict(type='GELU').
init_cfg (dict, optional) – Config for initialization. Defaults to None.