SimMIMSwinTransformer¶

class mmpretrain.models.selfsup.SimMIMSwinTransformer(arch='T', img_size=224, in_channels=3, drop_rate=0.0, drop_path_rate=0.1, out_indices=(3,), use_abs_pos_embed=False, with_cp=False, frozen_stages=-1, norm_eval=False, norm_cfg={'type': 'LN'}, stage_cfgs={}, patch_cfg={}, pad_small_map=False, init_cfg=None)[source]¶

Swin Transformer for SimMIM pre-training.

Parameters:

Args –
arch (str | dict) – Swin Transformer architecture Defaults to ‘T’.
img_size (int | tuple) – The size of input image. Defaults to 224.
in_channels (int) – The num of input channels. Defaults to 3.
drop_rate (float) – Dropout rate after embedding. Defaults to 0.
drop_path_rate (float) – Stochastic depth rate. Defaults to 0.1.
out_indices (tuple) – Layers to be outputted. Defaults to (3, ).
use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults to False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defaults to False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.
norm_cfg (dict) – Config dict for normalization layer at end of backbone. Defaults to dict(type=’LN’)
stage_cfgs (Sequence | dict) – Extra config dict for each stage. Defaults to empty dict.
patch_cfg (dict) – Extra config dict for patch embedding. Defaults to empty dict.
pad_small_map (bool) – If True, pad the small feature map to the window size, which is common used in detection and segmentation. If False, avoid shifting window and shrink the window size to the size of feature map, which is common used in classification. Defaults to False.
init_cfg (dict, optional) – The Config for initialization. Defaults to None.

forward(x, mask)[source]¶

Generate features for masked images.

The function supports two kind of forward behaviors. If the mask is not None, the forward function will be executed as masked image modeling pre-training; if the mask is None, the forward function will call super().forward(), which extract features from images without mask.

Parameters:

x (torch.Tensor) – Input images.
mask (torch.Tensor, optional) – Masks for images.

Returns:

A tuple containing features from multi-stages.

Return type:

tuple

init_weights()[source]¶: Initialize weights.