DeiTClsHead¶

class mmpretrain.models.heads.DeiTClsHead(num_classes, in_channels, hidden_dim=None, act_cfg={'type': 'Tanh'}, init_cfg={'layer': 'Linear', 'type': 'Constant', 'val': 0}, **kwargs)[source]¶

Distilled Vision Transformer classifier head.

Comparing with the VisionTransformerClsHead, this head adds an extra linear layer to handle the dist token. The final classification score is the average of both linear transformation results of cls_token and dist_token.

Parameters:

num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
hidden_dim (int, optional) – Number of the dimensions for hidden layer. Defaults to None, which means no extra hidden layer.
act_cfg (dict) – The activation config. Only available during pre-training. Defaults to dict(type='Tanh').
init_cfg (dict) – The extra initialization configs. Defaults to dict(type='Constant', layer='Linear', val=0).

forward(feats)[source]¶: The forward process.

pre_logits(feats)[source]¶

The process before the final classification head.

The input feats is a tuple of list of tensor, and each tensor is the feature of a backbone stage. In DeiTClsHead, we obtain the feature of the last stage and forward in hidden layer if exists.