

class mmpretrain.models.classifiers.ImageClassifier(backbone, neck=None, head=None, pretrained=None, train_cfg=None, data_preprocessor=None, init_cfg=None)[source]

Image classifiers for supervised classification task.

  • backbone (dict) – The backbone module. See mmpretrain.models.backbones.

  • neck (dict, optional) – The neck module to process features from backbone. See mmpretrain.models.necks. Defaults to None.

  • head (dict, optional) – The head module to do prediction and calculate loss from processed features. See mmpretrain.models.heads. Notice that if the head is not set, almost all methods cannot be used except extract_feat(). Defaults to None.

  • pretrained (str, optional) – The pretrained checkpoint path, support local path and remote path. Defaults to None.

  • train_cfg (dict, optional) –

    The training setting. The acceptable fields are:

    • augments (List[dict]): The batch augmentation methods to use. More details can be found in mmpretrain.model.utils.augment.

    • probs (List[float], optional): The probability of every batch augmentation methods. If None, choose evenly. Defaults to None.

    Defaults to None.

  • data_preprocessor (dict, optional) – The config for preprocessing input data. If None or no specified type, it will use “ClsDataPreprocessor” as type. See ClsDataPreprocessor for more details. Defaults to None.

  • init_cfg (dict, optional) – the config to control the initialization. Defaults to None.

extract_feat(inputs, stage='neck')[source]

Extract features from the input tensor with shape (N, C, …).

  • inputs (Tensor) – A batch of inputs. The shape of it should be (num_samples, num_channels, *img_shape).

  • stage (str) –

    Which stage to output the feature. Choose from:

    • ”backbone”: The output of backbone network. Returns a tuple including multiple stages features.

    • ”neck”: The output of neck module. Returns a tuple including multiple stages features.

    • ”pre_logits”: The feature before the final classification linear layer. Usually returns a tensor.

    Defaults to “neck”.


The output of specified stage. The output depends on detailed implementation. In general, the output of backbone and neck is a tuple and the output of pre_logits is a tensor.

Return type:

tuple | Tensor


  1. Backbone output

>>> import torch
>>> from mmengine import Config
>>> from mmpretrain.models import build_classifier
>>> cfg = Config.fromfile('configs/resnet/').model
>>> cfg.backbone.out_indices = (0, 1, 2, 3)  # Output multi-scale feature maps
>>> model = build_classifier(cfg)
>>> outs = model.extract_feat(torch.rand(1, 3, 224, 224), stage='backbone')
>>> for out in outs:
...     print(out.shape)
torch.Size([1, 64, 56, 56])
torch.Size([1, 128, 28, 28])
torch.Size([1, 256, 14, 14])
torch.Size([1, 512, 7, 7])
  1. Neck output

>>> import torch
>>> from mmengine import Config
>>> from mmpretrain.models import build_classifier
>>> cfg = Config.fromfile('configs/resnet/').model
>>> cfg.backbone.out_indices = (0, 1, 2, 3)  # Output multi-scale feature maps
>>> model = build_classifier(cfg)
>>> outs = model.extract_feat(torch.rand(1, 3, 224, 224), stage='neck')
>>> for out in outs:
...     print(out.shape)
torch.Size([1, 64])
torch.Size([1, 128])
torch.Size([1, 256])
torch.Size([1, 512])
  1. Pre-logits output (without the final linear classifier head)

>>> import torch
>>> from mmengine import Config
>>> from mmpretrain.models import build_classifier
>>> cfg = Config.fromfile('configs/vision_transformer/').model
>>> model = build_classifier(cfg)
>>> out = model.extract_feat(torch.rand(1, 3, 224, 224), stage='pre_logits')
>>> print(out.shape)  # The hidden dims in head is 3072
torch.Size([1, 3072])
forward(inputs, data_samples=None, mode='tensor')[source]

The unified entry for a forward process in both training and test.

The method should accept three modes: “tensor”, “predict” and “loss”:

  • “tensor”: Forward the whole network and return tensor(s) without any post-processing, same as a common PyTorch Module.

  • “predict”: Forward and return the predictions, which are fully processed to a list of DataSample.

  • “loss”: Forward and return a dict of losses according to the given inputs and data samples.

  • inputs (torch.Tensor) – The input tensor with shape (N, C, …) in general.

  • data_samples (List[DataSample], optional) – The annotation data of every samples. It’s required if mode="loss". Defaults to None.

  • mode (str) – Return what kind of value. Defaults to ‘tensor’.


The return type depends on mode.

  • If mode="tensor", return a tensor or a tuple of tensor.

  • If mode="predict", return a list of mmpretrain.structures.DataSample.

  • If mode="loss", return a dict of tensor.


Get the layer-wise depth of a parameter.


param_name (str) – The name of the parameter.


The layer-wise depth and the max depth.

Return type:

Tuple[int, int]

loss(inputs, data_samples)[source]

Calculate losses from a batch of inputs and data samples.

  • inputs (torch.Tensor) – The input tensor with shape (N, C, …) in general.

  • data_samples (List[DataSample]) – The annotation data of every samples.


a dictionary of loss components

Return type:

dict[str, Tensor]

predict(inputs, data_samples=None, **kwargs)[source]

Predict results from a batch of inputs.

  • inputs (torch.Tensor) – The input tensor with shape (N, C, …) in general.

  • data_samples (List[DataSample], optional) – The annotation data of every samples. Defaults to None.

  • **kwargs – Other keyword arguments accepted by the predict method of head.