ImageClassifier¶
- class mmpretrain.models.classifiers.ImageClassifier(backbone, neck=None, head=None, pretrained=None, train_cfg=None, data_preprocessor=None, init_cfg=None)[source]¶
Image classifiers for supervised classification task.
- Parameters:
backbone (dict) – The backbone module. See
mmpretrain.models.backbones
.neck (dict, optional) – The neck module to process features from backbone. See
mmpretrain.models.necks
. Defaults to None.head (dict, optional) – The head module to do prediction and calculate loss from processed features. See
mmpretrain.models.heads
. Notice that if the head is not set, almost all methods cannot be used exceptextract_feat()
. Defaults to None.pretrained (str, optional) – The pretrained checkpoint path, support local path and remote path. Defaults to None.
train_cfg (dict, optional) –
The training setting. The acceptable fields are:
augments (List[dict]): The batch augmentation methods to use. More details can be found in
mmpretrain.model.utils.augment
.probs (List[float], optional): The probability of every batch augmentation methods. If None, choose evenly. Defaults to None.
Defaults to None.
data_preprocessor (dict, optional) – The config for preprocessing input data. If None or no specified type, it will use “ClsDataPreprocessor” as type. See
ClsDataPreprocessor
for more details. Defaults to None.init_cfg (dict, optional) – the config to control the initialization. Defaults to None.
- extract_feat(inputs, stage='neck')[source]¶
Extract features from the input tensor with shape (N, C, …).
- Parameters:
inputs (Tensor) – A batch of inputs. The shape of it should be
(num_samples, num_channels, *img_shape)
.stage (str) –
Which stage to output the feature. Choose from:
”backbone”: The output of backbone network. Returns a tuple including multiple stages features.
”neck”: The output of neck module. Returns a tuple including multiple stages features.
”pre_logits”: The feature before the final classification linear layer. Usually returns a tensor.
Defaults to “neck”.
- Returns:
The output of specified stage. The output depends on detailed implementation. In general, the output of backbone and neck is a tuple and the output of pre_logits is a tensor.
- Return type:
tuple | Tensor
Examples
Backbone output
>>> import torch >>> from mmengine import Config >>> from mmpretrain.models import build_classifier >>> >>> cfg = Config.fromfile('configs/resnet/resnet18_8xb32_in1k.py').model >>> cfg.backbone.out_indices = (0, 1, 2, 3) # Output multi-scale feature maps >>> model = build_classifier(cfg) >>> outs = model.extract_feat(torch.rand(1, 3, 224, 224), stage='backbone') >>> for out in outs: ... print(out.shape) torch.Size([1, 64, 56, 56]) torch.Size([1, 128, 28, 28]) torch.Size([1, 256, 14, 14]) torch.Size([1, 512, 7, 7])
Neck output
>>> import torch >>> from mmengine import Config >>> from mmpretrain.models import build_classifier >>> >>> cfg = Config.fromfile('configs/resnet/resnet18_8xb32_in1k.py').model >>> cfg.backbone.out_indices = (0, 1, 2, 3) # Output multi-scale feature maps >>> model = build_classifier(cfg) >>> >>> outs = model.extract_feat(torch.rand(1, 3, 224, 224), stage='neck') >>> for out in outs: ... print(out.shape) torch.Size([1, 64]) torch.Size([1, 128]) torch.Size([1, 256]) torch.Size([1, 512])
Pre-logits output (without the final linear classifier head)
>>> import torch >>> from mmengine import Config >>> from mmpretrain.models import build_classifier >>> >>> cfg = Config.fromfile('configs/vision_transformer/vit-base-p16_pt-64xb64_in1k-224.py').model >>> model = build_classifier(cfg) >>> >>> out = model.extract_feat(torch.rand(1, 3, 224, 224), stage='pre_logits') >>> print(out.shape) # The hidden dims in head is 3072 torch.Size([1, 3072])
- forward(inputs, data_samples=None, mode='tensor')[source]¶
The unified entry for a forward process in both training and test.
The method should accept three modes: “tensor”, “predict” and “loss”:
“tensor”: Forward the whole network and return tensor(s) without any post-processing, same as a common PyTorch Module.
“predict”: Forward and return the predictions, which are fully processed to a list of
DataSample
.“loss”: Forward and return a dict of losses according to the given inputs and data samples.
- Parameters:
inputs (torch.Tensor) – The input tensor with shape (N, C, …) in general.
data_samples (List[DataSample], optional) – The annotation data of every samples. It’s required if
mode="loss"
. Defaults to None.mode (str) – Return what kind of value. Defaults to ‘tensor’.
- Returns:
The return type depends on
mode
.If
mode="tensor"
, return a tensor or a tuple of tensor.If
mode="predict"
, return a list ofmmpretrain.structures.DataSample
.If
mode="loss"
, return a dict of tensor.
- loss(inputs, data_samples)[source]¶
Calculate losses from a batch of inputs and data samples.
- Parameters:
inputs (torch.Tensor) – The input tensor with shape (N, C, …) in general.
data_samples (List[DataSample]) – The annotation data of every samples.
- Returns:
a dictionary of loss components
- Return type:
- predict(inputs, data_samples=None, **kwargs)[source]¶
Predict results from a batch of inputs.
- Parameters:
inputs (torch.Tensor) – The input tensor with shape (N, C, …) in general.
data_samples (List[DataSample], optional) – The annotation data of every samples. Defaults to None.
**kwargs – Other keyword arguments accepted by the
predict
method ofhead
.