mmpretrain.models¶

models 包中包含了若干子包，分别对应神经网络中不同的组件。

classifiers: The top-level module which defines the whole process of a classification model.
selfsup: The top-level module which defines the whole process of a self-supervised learning model.
retrievers: The top-level module which defines the whole process of a retrieval model.
backbones: Usually a feature extraction network, e.g., ResNet, MobileNet.
necks: The component between backbones and heads, e.g., GlobalAveragePooling.
heads: The component for specific tasks.
losses: Loss functions.
peft: The PEFT (Parameter-Efficient Fine-Tuning) module, e.g. LoRAModel.
utils: Some helper functions and common components used in various networks.
- data_preprocessor: The component before model to preprocess the inputs, e.g., ClsDataPreprocessor.
- Common Components：多个网络共用的一些公共模块。
- Helper Functions：模型中用到的辅助函数。

Build Functions¶

`build_classifier`	Build classifier.
`build_backbone`	Build backbone.
`build_neck`	Build neck.
`build_head`	Build head.
`build_loss`	Build loss.

Classifiers¶

`BaseClassifier`	Base class for classifiers.
`ImageClassifier`	Image classifiers for supervised classification task.
`TimmClassifier`	Image classifiers for pytorch-image-models (timm) model.
`HuggingFaceClassifier`	Image classifiers for HuggingFace model.

Self-supervised Algorithms¶

`BaseSelfSupervisor`	BaseModel for Self-Supervised Learning.
`BEiT`	BEiT v1/v2.
`BYOL`	BYOL.
`BarlowTwins`	BarlowTwins.
`CAE`	CAE.
`DenseCL`	DenseCL.
`EVA`	EVA.
`iTPN`	iTPN.
`MAE`	MAE.
`MILAN`	MILAN.
`MaskFeat`	MaskFeat.
`MixMIM`	MixMIM.
`MoCo`	MoCo.
`MoCoV3`	MoCo v3.
`SimCLR`	SimCLR.
`SimMIM`	SimMIM.
`SimSiam`	SimSiam.
`SparK`	Implementation of SparK.
`SwAV`	SwAV.

Some of above algorithms modified the backbone module to adapt the extra inputs like mask, and here is the a list of these modified backbone modules.

`BEiTPretrainViT`	Vision Transformer for BEiT pre-training.
`CAEPretrainViT`	Vision Transformer for CAE pre-training and the implementation is based on BEiTViT.
`iTPNHiViT`	HiViT for iTPN pre-training.
`MAEHiViT`	HiViT for MAE pre-training.
`MAEViT`	Vision Transformer for MAE pre-training.
`MILANViT`	Vision Transformer for MILAN pre-training.
`MaskFeatViT`	Vision Transformer for MaskFeat pre-training.
`MixMIMPretrainTransformer`	MixMIM backbone for MixMIM pre-training.
`MoCoV3ViT`	Vision Transformer for MoCoV3 pre-training.
`SimMIMSwinTransformer`	Swin Transformer for SimMIM pre-training.

Some self-supervise algorithms need an external target generator to generate the optimization target. Here is a list of target generators.

`VQKD`	Vector-Quantized Knowledge Distillation.
`DALLEEncoder`	DALL-E Encoder for feature extraction.
`HOGGenerator`	Generate HOG feature for images.
`CLIPGenerator`	Get the features and attention from the last layer of CLIP.

Retrievers¶

`BaseRetriever`	Base class for retriever.
`ImageToImageRetriever`	Image To Image Retriever for supervised retrieval task.

Multi-Modality Algorithms¶

`Blip2Caption`	BLIP2 Caption.
`Blip2Retrieval`	BLIP2 Retriever.
`Blip2VQA`	BLIP2 VQA.
`BlipCaption`	BLIP Caption.
`BlipGrounding`	BLIP Grounding.
`BlipNLVR`	BLIP NLVR.
`BlipRetrieval`	BLIP Retriever.
`BlipVQA`	BLIP VQA.
`Flamingo`	The Open Flamingo model for multiple tasks.
`OFA`	The OFA model for multiple tasks.
`MiniGPT4`	The multi-modality model of MiniGPT-4.
`Llava`	The LLaVA model for multiple tasks.
`Otter`	The Otter model for multiple tasks.

Backbones¶

`AlexNet`	AlexNet backbone.
`BEiTViT`	Backbone for BEiT.
`CSPDarkNet`	CSP-Darknet backbone used in YOLOv4.
`CSPNet`	The abstract CSP Network class.
`CSPResNeXt`	CSP-ResNeXt backbone.
`CSPResNet`	CSP-ResNet backbone.
`Conformer`	Conformer backbone.
`ConvMixer`	ConvMixer.
`ConvNeXt`	ConvNeXt v1&v2 backbone.
`DaViT`	DaViT.
`DeiT3`	DeiT3 backbone.
`DenseNet`	DenseNet.
`DistilledVisionTransformer`	Distilled Vision Transformer.
`EdgeNeXt`	EdgeNeXt.
`EfficientFormer`	EfficientFormer.
`EfficientNet`	EfficientNet backbone.
`EfficientNetV2`	EfficientNetV2 backbone.
`HiViT`	HiViT.
`HRNet`	HRNet backbone.
`HorNet`	HorNet backbone.
`InceptionV3`	Inception V3 backbone.
`LeNet5`	LeNet5 backbone.
`LeViT`	LeViT backbone.
`MViT`	Multi-scale ViT v2.
`MlpMixer`	Mlp-Mixer backbone.
`MobileNetV2`	MobileNetV2 backbone.
`MobileNetV3`	MobileNetV3 backbone.
`MobileOne`	MobileOne backbone.
`MobileViT`	MobileViT backbone.
`PCPVT`	The backbone of Twins-PCPVT.
`PoolFormer`	PoolFormer.
`PyramidVig`	Pyramid Vision GNN backbone.
`RegNet`	RegNet backbone.
`RepLKNet`	RepLKNet backbone.
`RepMLPNet`	RepMLPNet backbone.
`RepVGG`	RepVGG backbone.
`Res2Net`	Res2Net backbone.
`ResNeSt`	ResNeSt backbone.
`ResNeXt`	ResNeXt backbone.
`ResNet`	ResNet backbone.
`ResNetV1c`	ResNetV1c backbone.
`ResNetV1d`	ResNetV1d backbone.
`ResNet_CIFAR`	ResNet backbone for CIFAR.
`RevVisionTransformer`	Reversible Vision Transformer.
`SEResNeXt`	SEResNeXt backbone.
`SEResNet`	SEResNet backbone.
`SVT`	The backbone of Twins-SVT.
`ShuffleNetV1`	ShuffleNetV1 backbone.
`ShuffleNetV2`	ShuffleNetV2 backbone.
`SparseResNet`	ResNet with sparse module conversion function.
`SparseConvNeXt`	ConvNeXt with sparse module conversion function.
`SwinTransformer`	Swin Transformer.
`SwinTransformerV2`	Swin Transformer V2.
`T2T_ViT`	Tokens-to-Token Vision Transformer (T2T-ViT)
`TIMMBackbone`	Wrapper to use backbones from timm library.
`TNT`	Transformer in Transformer.
`VAN`	Visual Attention Network.
`VGG`	VGG backbone.
`Vig`	Vision GNN backbone.
`VisionTransformer`	Vision Transformer.
`ViTSAM`	Vision Transformer as image encoder used in SAM.
`XCiT`	XCiT backbone.
`ViTEVA02`	EVA02 Vision Transformer.

Necks¶

`BEiTV2Neck`	Neck for BEiTV2 Pre-training.
`CAENeck`	Neck for CAE Pre-training.
`ClsBatchNormNeck`	Normalize cls token across batch before head.
`DenseCLNeck`	The non-linear neck of DenseCL.
`GeneralizedMeanPooling`	Generalized Mean Pooling neck.
`GlobalAveragePooling`	Global Average Pooling neck.
`HRFuseScales`	Fuse feature map of multiple scales in HRNet.
`LinearNeck`	Linear neck with Dimension projection.
`MAEPretrainDecoder`	Decoder for MAE Pre-training.
`MILANPretrainDecoder`	Prompt decoder for MILAN.
`MixMIMPretrainDecoder`	Decoder for MixMIM Pretraining.
`MoCoV2Neck`	The non-linear neck of MoCo v2: fc-relu-fc.
`NonLinearNeck`	The non-linear neck.
`SimMIMLinearDecoder`	Linear Decoder For SimMIM pretraining.
`SwAVNeck`	The non-linear neck of SwAV: fc-bn-relu-fc-normalization.
`iTPNPretrainDecoder`	The neck module of iTPN (transformer pyramid network).
`SparKLightDecoder`	The decoder for SparK, which upsamples the feature maps.

Heads¶

`ArcFaceClsHead`	ArcFace classifier head.
`BEiTV1Head`	Head for BEiT v1 Pre-training.
`BEiTV2Head`	Head for BEiT v2 Pre-training.
`CAEHead`	Head for CAE Pre-training.
`CSRAClsHead`	Class-specific residual attention classifier head.
`ClsHead`	Classification head.
`ConformerHead`	Linear classifier head.
`ContrastiveHead`	Head for contrastive learning.
`DeiTClsHead`	Distilled Vision Transformer classifier head.
`EfficientFormerClsHead`	EfficientFormer classifier head.
`LatentCrossCorrelationHead`	Head for latent feature cross correlation.
`LatentPredictHead`	Head for latent feature prediction.
`LeViTClsHead`
`LinearClsHead`	Linear classifier head.
`MAEPretrainHead`	Head for MAE Pre-training.
`MIMHead`	Pre-training head for Masked Image Modeling.
`MixMIMPretrainHead`	Head for MixMIM Pre-training.
`MoCoV3Head`	Head for MoCo v3 Pre-training.
`MultiLabelClsHead`	Classification head for multilabel task.
`MultiLabelLinearClsHead`	Linear classification head for multilabel task.
`MultiTaskHead`	Multi task head.
`SimMIMHead`	Head for SimMIM Pre-training.
`StackedLinearClsHead`	Classifier head with several hidden fc layer and a output fc layer.
`SwAVHead`	Head for SwAV Pre-training.
`VigClsHead`	The classification head for Vision GNN.
`VisionTransformerClsHead`	Vision Transformer classifier head.
`iTPNClipHead`	Head for iTPN Pre-training using Clip.
`SparKPretrainHead`	Pre-training head for SparK.

Losses¶

`AsymmetricLoss`	asymmetric loss.
`CAELoss`	Loss function for CAE.
`CosineSimilarityLoss`	Cosine similarity loss function.
`CrossCorrelationLoss`	Cross correlation loss function.
`CrossEntropyLoss`	Cross entropy loss.
`FocalLoss`	Focal loss.
`LabelSmoothLoss`	Initializer for the label smoothed cross entropy loss.
`PixelReconstructionLoss`	Loss for the reconstruction of pixel in Masked Image Modeling.
`SeesawLoss`	Implementation of seesaw loss.
`SwAVLoss`	The Loss for SwAV.

PEFT¶

LoRAModel

Implements LoRA in a module.

models.utils¶

This package includes some helper functions and common components used in various networks.

Common Components¶

`ConditionalPositionEncoding`	The Conditional Position Encoding (CPE) module.
`CosineEMA`	CosineEMA is implemented for updating momentum parameter, used in BYOL, MoCoV3, etc.
`HybridEmbed`	CNN Feature Map Embedding.
`InvertedResidual`	Inverted Residual Block.
`LayerScale`	LayerScale layer.
`MultiheadAttention`	Multi-head Attention Module.
`PatchEmbed`	Image to Patch Embedding.
`PatchMerging`	Merge patch feature map.
`SELayer`	Squeeze-and-Excitation Module.
`ShiftWindowMSA`	Shift Window Multihead Self-Attention Module.
`WindowMSA`	Window based multi-head self-attention (W-MSA) module with relative position bias.
`WindowMSAV2`	Window based multi-head self-attention (W-MSA) module with relative position bias.

Helper Functions¶

`channel_shuffle`	Channel Shuffle operation.
`is_tracing`	Determine whether the model is called during the tracing of code with `torch.jit.trace`.
`make_divisible`	Make divisible function.
`resize_pos_embed`	Resize pos_embed weights.
`resize_relative_position_bias_table`	Resize relative position bias table.
`to_ntuple`	A to_tuple function generator.