Shortcuts

MILANPretrainDecoder

class mmpretrain.models.necks.MILANPretrainDecoder(num_patches=196, patch_size=16, in_chans=3, embed_dim=1024, decoder_embed_dim=512, decoder_depth=8, decoder_num_heads=16, predict_feature_dim=512, mlp_ratio=4, norm_cfg={'eps': 1e-06, 'type': 'LN'}, init_cfg=None)[source]

Prompt decoder for MILAN.

This decoder is used in MILAN pretraining, which will not update these visible tokens from the encoder.

Parameters:
  • num_patches (int) – The number of total patches. Defaults to 196.

  • patch_size (int) – Image patch size. Defaults to 16.

  • in_chans (int) – The channel of input image. Defaults to 3.

  • embed_dim (int) – Encoder’s embedding dimension. Defaults to 1024.

  • decoder_embed_dim (int) – Decoder’s embedding dimension. Defaults to 512.

  • decoder_depth (int) – The depth of decoder. Defaults to 8.

  • decoder_num_heads (int) – Number of attention heads of decoder. Defaults to 16.

  • predict_feature_dim (int) – The dimension of the feature to be predicted. Defaults to 512.

  • mlp_ratio (int) – Ratio of mlp hidden dim to decoder’s embedding dim. Defaults to 4.

  • norm_cfg (dict) – Normalization layer. Defaults to LayerNorm.

  • init_cfg (Union[List[dict], dict], optional) – Initialization config dict. Defaults to None.

forward(x, ids_restore, ids_keep, ids_dump)[source]

Forward function.

Parameters:
  • x (torch.Tensor) – The input features, which is of shape (N, L, C).

  • ids_restore (torch.Tensor) – The indices to restore these tokens to the original image.

  • ids_keep (torch.Tensor) – The indices of tokens to be kept.

  • ids_dump (torch.Tensor) – The indices of tokens to be masked.

Returns:

The reconstructed features, which is of shape (N, L, C).

Return type:

torch.Tensor