MAEPretrainDecoder¶

class mmpretrain.models.necks.MAEPretrainDecoder(num_patches=196, patch_size=16, in_chans=3, embed_dim=1024, decoder_embed_dim=512, decoder_depth=8, decoder_num_heads=16, mlp_ratio=4, norm_cfg={'eps': 1e-06, 'type': 'LN'}, predict_feature_dim=None, init_cfg=None)[源代码]¶

Decoder for MAE Pre-training.

Some of the code is borrowed from https://github.com/facebookresearch/mae.

参数:

num_patches (int) – The number of total patches. Defaults to 196.
patch_size (int) – Image patch size. Defaults to 16.
in_chans (int) – The channel of input image. Defaults to 3.
embed_dim (int) – Encoder’s embedding dimension. Defaults to 1024.
decoder_embed_dim (int) – Decoder’s embedding dimension. Defaults to 512.
decoder_depth (int) – The depth of decoder. Defaults to 8.
decoder_num_heads (int) – Number of attention heads of decoder. Defaults to 16.
mlp_ratio (int) – Ratio of mlp hidden dim to decoder’s embedding dim. Defaults to 4.
norm_cfg (dict) – Normalization layer. Defaults to LayerNorm.
init_cfg (Union[List[dict], dict], optional) – Initialization config dict. Defaults to None.

示例

>>> from mmpretrain.models import MAEPretrainDecoder
>>> import torch
>>> self = MAEPretrainDecoder()
>>> self.eval()
>>> inputs = torch.rand(1, 50, 1024)
>>> ids_restore = torch.arange(0, 196).unsqueeze(0)
>>> level_outputs = self.forward(inputs, ids_restore)
>>> print(tuple(level_outputs.shape))
(1, 196, 768)

property decoder_norm¶: The normalization layer of decoder.

forward(x, ids_restore)[源代码]¶

The forward function.

The process computes the visible patches’ features vectors and the mask tokens to output feature vectors, which will be used for reconstruction.

参数:

x (torch.Tensor) – hidden features, which is of shape B x (L * mask_ratio) x C.
ids_restore (torch.Tensor) – ids to restore original image.

返回:

The reconstructed feature vectors, which is of shape B x (num_patches) x C.

返回类型:

torch.Tensor

init_weights()[源代码]¶: Initialize position embedding and mask token of MAE decoder.