class mmpretrain.models.backbones.EdgeNeXt(arch='xxsmall', in_channels=3, global_blocks=[0, 1, 1, 1], global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'], drop_path_rate=0.0, layer_scale_init_value=1e-06, linear_pw_conv=True, mlp_ratio=4, conv_kernel_sizes=[3, 5, 7, 9], use_pos_embd_csa=[False, True, False, False], use_pos_embd_global=False, d2_scales=[2, 2, 3, 4], norm_cfg={'eps': 1e-06, 'type': 'LN2d'}, out_indices=-1, frozen_stages=0, gap_before_final_norm=True, act_cfg={'type': 'GELU'}, init_cfg=None)[source]


A PyTorch implementation of: EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications

Inspiration from

  • arch (str | dict) –

    The model’s architecture. If string, it should be one of architectures in EdgeNeXt.arch_settings. And if dict, it should include the following keys:

    • channels (list[int]): The number of channels at each stage.

    • depths (list[int]): The number of blocks at each stage.

    • num_heads (list[int]): The number of heads at each stage.

    Defaults to ‘xxsmall’.

  • in_channels (int) – The number of input channels. Defaults to 3.

  • global_blocks (list[int]) – The number of global blocks. Defaults to [0, 1, 1, 1].

  • global_block_type (list[str]) – The type of global blocks. Defaults to [‘None’, ‘SDTA’, ‘SDTA’, ‘SDTA’].

  • drop_path_rate (float) – Stochastic depth dropout rate. Defaults to 0.

  • layer_scale_init_value (float) – Initial value of layer scale. Defaults to 1e-6.

  • linear_pw_conv (bool) – Whether to use linear layer to do pointwise convolution. Defaults to False.

  • mlp_ratio (int) – The number of channel ratio in MLP layers. Defaults to 4.

  • conv_kernel_size (list[int]) – The kernel size of convolutional layers at each stage. Defaults to [3, 5, 7, 9].

  • use_pos_embd_csa (list[bool]) – Whether to use positional embedding in Channel Self-Attention. Defaults to [False, True, False, False].

  • use_pos_emebd_global (bool) – Whether to use positional embedding for whole network. Defaults to False.

  • d2_scales (list[int]) – The number of channel groups used for SDTA at each stage. Defaults to [2, 2, 3, 4].

  • norm_cfg (dict) – The config of normalization layer. Defaults to dict(type='LN2d', eps=1e-6).

  • out_indices (Sequence | int) – Output from which stages. Defaults to -1, means the last stage.

  • frozen_stages (int) – Stages to be frozen (all param fixed). Defaults to 0, which means not freezing any parameters.

  • gap_before_final_norm (bool) – Whether to globally average the feature map before the final norm layer. Defaults to True.

  • act_cfg (dict) – The config of activation layer. Defaults to dict(type='GELU').

  • init_cfg (dict, optional) – Config for initialization. Defaults to None.

Read the Docs v: latest
On Read the Docs
Project Home

Free document hosting provided by Read the Docs.