ImageClassifier¶

class mmpretrain.models.classifiers.ImageClassifier(backbone, neck=None, head=None, pretrained=None, train_cfg=None, data_preprocessor=None, init_cfg=None)[source]¶

Image classifiers for supervised classification task.

Parameters:

backbone (dict) – The backbone module. See mmpretrain.models.backbones.
neck (dict, optional) – The neck module to process features from backbone. See mmpretrain.models.necks. Defaults to None.
head (dict, optional) – The head module to do prediction and calculate loss from processed features. See mmpretrain.models.heads. Notice that if the head is not set, almost all methods cannot be used except extract_feat(). Defaults to None.
pretrained (str, optional) – The pretrained checkpoint path, support local path and remote path. Defaults to None.
train_cfg (dict, optional) –
The training setting. The acceptable fields are:
- augments (List[dict]): The batch augmentation methods to use. More details can be found in mmpretrain.model.utils.augment.
- probs (List[float], optional): The probability of every batch augmentation methods. If None, choose evenly. Defaults to None.
Defaults to None.
data_preprocessor (dict, optional) – The config for preprocessing input data. If None or no specified type, it will use “ClsDataPreprocessor” as type. See ClsDataPreprocessor for more details. Defaults to None.
init_cfg (dict, optional) – the config to control the initialization. Defaults to None.

extract_feat(inputs, stage='neck')[source]¶

Extract features from the input tensor with shape (N, C, …).

Parameters:

inputs (Tensor) – A batch of inputs. The shape of it should be (num_samples, num_channels, *img_shape).
stage (str) –
Which stage to output the feature. Choose from:
- ”backbone”: The output of backbone network. Returns a tuple including multiple stages features.
- ”neck”: The output of neck module. Returns a tuple including multiple stages features.
- ”pre_logits”: The feature before the final classification linear layer. Usually returns a tensor.
Defaults to “neck”.

Returns:

The output of specified stage. The output depends on detailed implementation. In general, the output of backbone and neck is a tuple and the output of pre_logits is a tensor.

Return type:

tuple | Tensor

Examples

Backbone output

>>> import torch
>>> from mmengine import Config
>>> from mmpretrain.models import build_classifier
>>>
>>> cfg = Config.fromfile('configs/resnet/resnet18_8xb32_in1k.py').model
>>> cfg.backbone.out_indices = (0, 1, 2, 3)  # Output multi-scale feature maps
>>> model = build_classifier(cfg)
>>> outs = model.extract_feat(torch.rand(1, 3, 224, 224), stage='backbone')
>>> for out in outs:
...     print(out.shape)
torch.Size([1, 64, 56, 56])
torch.Size([1, 128, 28, 28])
torch.Size([1, 256, 14, 14])
torch.Size([1, 512, 7, 7])

Neck output

>>> import torch
>>> from mmengine import Config
>>> from mmpretrain.models import build_classifier
>>>
>>> cfg = Config.fromfile('configs/resnet/resnet18_8xb32_in1k.py').model
>>> cfg.backbone.out_indices = (0, 1, 2, 3)  # Output multi-scale feature maps
>>> model = build_classifier(cfg)
>>>
>>> outs = model.extract_feat(torch.rand(1, 3, 224, 224), stage='neck')
>>> for out in outs:
...     print(out.shape)
torch.Size([1, 64])
torch.Size([1, 128])
torch.Size([1, 256])
torch.Size([1, 512])

Pre-logits output (without the final linear classifier head)

>>> import torch
>>> from mmengine import Config
>>> from mmpretrain.models import build_classifier
>>>
>>> cfg = Config.fromfile('configs/vision_transformer/vit-base-p16_pt-64xb64_in1k-224.py').model
>>> model = build_classifier(cfg)
>>>
>>> out = model.extract_feat(torch.rand(1, 3, 224, 224), stage='pre_logits')
>>> print(out.shape)  # The hidden dims in head is 3072
torch.Size([1, 3072])

forward(inputs, data_samples=None, mode='tensor')[source]¶

The unified entry for a forward process in both training and test.

The method should accept three modes: “tensor”, “predict” and “loss”:

“tensor”: Forward the whole network and return tensor(s) without any post-processing, same as a common PyTorch Module.
“predict”: Forward and return the predictions, which are fully processed to a list of DataSample.
“loss”: Forward and return a dict of losses according to the given inputs and data samples.

Parameters:

inputs (torch.Tensor) – The input tensor with shape (N, C, …) in general.
data_samples (List[DataSample], optional) – The annotation data of every samples. It’s required if mode="loss". Defaults to None.
mode (str) – Return what kind of value. Defaults to ‘tensor’.

Returns:

The return type depends on mode.

If mode="tensor", return a tensor or a tuple of tensor.
If mode="predict", return a list of mmpretrain.structures.DataSample.
If mode="loss", return a dict of tensor.

get_layer_depth(param_name)[source]¶

Get the layer-wise depth of a parameter.

Parameters:: param_name (str) – The name of the parameter.
Returns:: The layer-wise depth and the max depth.
Return type:: Tuple[int, int]

loss(inputs, data_samples)[source]¶

Calculate losses from a batch of inputs and data samples.

Parameters:

inputs (torch.Tensor) – The input tensor with shape (N, C, …) in general.
data_samples (List[DataSample]) – The annotation data of every samples.

Returns:

a dictionary of loss components

Return type:

dict[str, Tensor]

predict(inputs, data_samples=None, **kwargs)[source]¶

Predict results from a batch of inputs.

Parameters:

inputs (torch.Tensor) – The input tensor with shape (N, C, …) in general.
data_samples (List[DataSample], optional) – The annotation data of every samples. Defaults to None.
**kwargs – Other keyword arguments accepted by the predict method of head.