XCiT¶

class mmpretrain.models.backbones.XCiT(img_size=224, patch_size=16, in_channels=3, embed_dims=768, depth=12, cls_attn_layers=2, num_heads=12, mlp_ratio=4.0, qkv_bias=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, use_pos_embed=True, layer_scale_init_value=1.0, tokens_norm=False, out_type='cls_token', out_indices=(-1,), final_norm=True, frozen_stages=-1, bn_norm_cfg={'type': 'BN'}, norm_cfg={'eps': 1e-06, 'type': 'LN'}, act_cfg={'type': 'GELU'}, init_cfg={'layer': 'Linear', 'type': 'TruncNormal'})[source]¶

XCiT backbone.

A PyTorch implementation of XCiT backbone introduced by: XCiT: Cross-Covariance Image Transformers

Parameters:

img_size (int, tuple) – Input image size. Defaults to 224.
patch_size (int) – Patch size. Defaults to 16.
in_channels (int) – Number of input channels. Defaults to 3.
embed_dims (int) – Embedding dimension. Defaults to 768.
depth (int) – depth of vision transformer. Defaults to 12.
cls_attn_layers (int) – Depth of Class attention layers. Defaults to 2.
num_heads (int) – Number of attention heads. Defaults to 12.
mlp_ratio (int) – Ratio of mlp hidden dim to embedding dim. Defaults to 4.
qkv_bias (bool) – enable bias for qkv if True. Defaults to True.
drop_rate (float) – Probability of an element to be zeroed after the feed forward layer. Defaults to 0.
attn_drop_rate (float) – The drop out rate for attention output weights. Defaults to 0.
drop_path_rate (float) – Stochastic depth rate. Defaults to 0.
use_pos_embed (bool) – Whether to use positional encoding. Defaults to True.
layer_scale_init_value (float) – The initial value for layer scale. Defaults to 1.
tokens_norm (bool) – Whether to normalize all tokens or just the cls_token in the CA. Defaults to False.
out_indices (Sequence[int]) – Output from which layers. Defaults to (-1, ).
frozen_stages (int) – Layers to be frozen (all param fixed), and 0 means to freeze the stem stage. Defaults to -1, which means not freeze any parameters.
bn_norm_cfg (dict) – Config dict for the batch norm layers in LPI and ConvPatchEmbed. Defaults to dict(type='BN').
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN', eps=1e-6).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type='GELU').
init_cfg (dict | list[dict], optional) – Initialization config dict.