class mmpretrain.models.backbones.XCiT(img_size=224, patch_size=16, in_channels=3, embed_dims=768, depth=12, cls_attn_layers=2, num_heads=12, mlp_ratio=4.0, qkv_bias=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, use_pos_embed=True, layer_scale_init_value=1.0, tokens_norm=False, out_type='cls_token', out_indices=(-1,), final_norm=True, frozen_stages=-1, bn_norm_cfg={'type': 'BN'}, norm_cfg={'eps': 1e-06, 'type': 'LN'}, act_cfg={'type': 'GELU'}, init_cfg={'layer': 'Linear', 'type': 'TruncNormal'})[源代码]

XCiT backbone.

A PyTorch implementation of XCiT backbone introduced by: XCiT: Cross-Covariance Image Transformers

  • img_size (int, tuple) – Input image size. Defaults to 224.

  • patch_size (int) – Patch size. Defaults to 16.

  • in_channels (int) – Number of input channels. Defaults to 3.

  • embed_dims (int) – Embedding dimension. Defaults to 768.

  • depth (int) – depth of vision transformer. Defaults to 12.

  • cls_attn_layers (int) – Depth of Class attention layers. Defaults to 2.

  • num_heads (int) – Number of attention heads. Defaults to 12.

  • mlp_ratio (int) – Ratio of mlp hidden dim to embedding dim. Defaults to 4.

  • qkv_bias (bool) – enable bias for qkv if True. Defaults to True.

  • drop_rate (float) – Probability of an element to be zeroed after the feed forward layer. Defaults to 0.

  • attn_drop_rate (float) – The drop out rate for attention output weights. Defaults to 0.

  • drop_path_rate (float) – Stochastic depth rate. Defaults to 0.

  • use_pos_embed (bool) – Whether to use positional encoding. Defaults to True.

  • layer_scale_init_value (float) – The initial value for layer scale. Defaults to 1.

  • tokens_norm (bool) – Whether to normalize all tokens or just the cls_token in the CA. Defaults to False.

  • out_indices (Sequence[int]) – Output from which layers. Defaults to (-1, ).

  • frozen_stages (int) – Layers to be frozen (all param fixed), and 0 means to freeze the stem stage. Defaults to -1, which means not freeze any parameters.

  • bn_norm_cfg (dict) – Config dict for the batch norm layers in LPI and ConvPatchEmbed. Defaults to dict(type='BN').

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type='LN', eps=1e-6).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type='GELU').

  • init_cfg (dict | list[dict], optional) – Initialization config dict.

Read the Docs v: latest
On Read the Docs
Project Home

Free document hosting provided by Read the Docs.