Shortcuts

MultiheadAttention

class mmpretrain.models.utils.MultiheadAttention(embed_dims, num_heads, input_dims=None, attn_drop=0.0, proj_drop=0.0, dropout_layer={'drop_prob': 0.0, 'type': 'Dropout'}, qkv_bias=True, qk_scale=None, proj_bias=True, v_shortcut=False, use_layer_scale=False, layer_scale_init_value=0.0, init_cfg=None)[source]

Multi-head Attention Module.

This module implements multi-head attention that supports different input dims and embed dims. And it also supports a shortcut from value, which is useful if input dims is not the same with embed dims.

Parameters:
  • embed_dims (int) – The embedding dimension.

  • num_heads (int) – Parallel attention heads.

  • input_dims (int, optional) – The input dimension, and if None, use embed_dims. Defaults to None.

  • attn_drop (float) – Dropout rate of the dropout layer after the attention calculation of query and key. Defaults to 0.

  • proj_drop (float) – Dropout rate of the dropout layer after the output projection. Defaults to 0.

  • dropout_layer (dict) – The dropout config before adding the shortcut. Defaults to dict(type='Dropout', drop_prob=0.).

  • qkv_bias (bool) – If True, add a learnable bias to q, k, v. Defaults to True.

  • qk_scale (float, optional) – Override default qk scale of head_dim ** -0.5 if set. Defaults to None.

  • proj_bias (bool) – Defaults to True.

  • v_shortcut (bool) – Add a shortcut from value to output. It’s usually used if input_dims is different from embed_dims. Defaults to False.

  • use_layer_scale (bool) – Whether to use layer scale. Defaults to False.

  • layer_scale_init_value (float or torch.Tensor) – Init value of layer scale. Defaults to 0.

  • init_cfg (dict, optional) – The Config for initialization. Defaults to None.

Read the Docs v: latest
Versions
latest
stable
mmcls-1.x
mmcls-0.x
dev
Downloads
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.