iTPNHiViT¶
- class mmpretrain.models.selfsup.iTPNHiViT(arch='base', img_size=224, patch_size=16, inner_patches=4, stem_mlp_ratio=3.0, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'eps': 1e-06, 'type': 'LN'}, ape=True, rpe=False, layer_scale_init_value=0.0, mask_ratio=0.75, reconstruction_type='pixel', **kwargs)[source]¶
HiViT for iTPN pre-training.
- Parameters:
inner_patches (int) – Inner patch. Defaults to 4.
stem_mlp_ratio (int) – Ratio of MLP hidden dim to embedding dim in the first two stages. Defaults to 3.
mlp_ratio (int) – Ratio of MLP hidden dim to embedding dim in the last stage. Defaults to 4.
qkv_bias (bool) – Enable bias for qkv projections if True.
qk_scale (float) – The number of divider after q@k. Default to None.
drop_rate (float) – Probability of an element to be zeroed. Defaults to 0.
attn_drop_rate (float) – The drop out rate for attention output weights. Defaults to 0.
drop_path_rate (float) – stochastic depth rate. Defaults to 0.
norm_cfg (dict) – Config dict for normalization layer. Defaults to
dict(type='LN')
.ape (bool) – If True, add absolute position embedding to the patch embedding.
rpe (bool) – If True, add relative position embedding to the patch embedding.
layer_scale_init_value (float) – Layer-scale init values. Defaults to 0.
mask_ratio (bool) – The ratio of total number of patches to be masked. Defaults to 0.75.
reconstruction_type (str) – The reconstruction of self-supervised learning. Defaults to ‘pixel’.
- forward(x, mask=True)[source]¶
Generate features for masked images.
The function supports two kind of forward behaviors. If the
mask
isTrue
, the function will generate mask to masking some patches randomly and get the hidden features for visible patches, which means the function will be executed as masked imagemodeling pre-training; if themask
isNone
orFalse
, the forward function will callsuper().forward()
, which extract features from images without mask.- Parameters:
x (torch.Tensor) – Input images, which is of shape B x C x H x W.
mask (bool, optional) – To indicate whether the forward function generating
mask
or not.
- Returns:
Hidden features, mask and the ids to restore original image.
x
(torch.Tensor): hidden features, which is of shape B x (L * mask_ratio) x C.mask
(torch.Tensor): mask used to mask image.ids_restore
(torch.Tensor): ids to restore original image.
- Return type:
Tuple[torch.Tensor, torch.Tensor, torch.Tensor]
- forward_clip(x, mask=True)[source]¶
Generate features for masked images.
The function supports two kind of forward behaviors. If the
mask
isTrue
, the function will generate mask to masking some patches randomly and get the hidden features for visible patches, which means the function will be executed as masked imagemodeling pre-training; if themask
isNone
orFalse
, the forward function will callsuper().forward()
, which extract features from images without mask.- Parameters:
x (torch.Tensor) – Input images, which is of shape B x C x H x W.
mask (bool, optional) – To indicate whether the forward function generating
mask
or not.
- Returns:
Hidden features, mask and the ids to restore original image.
x
(torch.Tensor): hidden features, which is of shape B x (L * mask_ratio) x C.mask
(torch.Tensor): mask used to mask image.ids_restore
(torch.Tensor): ids to restore original image.
- Return type:
Tuple[torch.Tensor, torch.Tensor, torch.Tensor]
- forward_pixel(x, mask=True)[source]¶
Generate features for masked images.
The function supports two kind of forward behaviors. If the
mask
isTrue
, the function will generate mask to masking some patches randomly and get the hidden features for visible patches, which means the function will be executed as masked imagemodeling pre-training; if themask
isNone
orFalse
, the forward function will callsuper().forward()
, which extract features from images without mask.- Parameters:
x (torch.Tensor) – Input images, which is of shape B x C x H x W.
mask (bool, optional) – To indicate whether the forward function generating
mask
or not.
- Returns:
Hidden features, mask and the ids to restore original image.
x
(torch.Tensor): hidden features, which is of shape B x (L * mask_ratio) x C.mask
(torch.Tensor): mask used to mask image.ids_restore
(torch.Tensor): ids to restore original image.
- Return type:
Tuple[torch.Tensor, torch.Tensor, torch.Tensor]