Shortcuts

mmpretrain.models

The models package contains several sub-packages for addressing the different components of a model.

  • classifiers: The top-level module which defines the whole process of a classification model.

  • selfsup: The top-level module which defines the whole process of a self-supervised learning model.

  • retrievers: The top-level module which defines the whole process of a retrieval model.

  • backbones: Usually a feature extraction network, e.g., ResNet, MobileNet.

  • necks: The component between backbones and heads, e.g., GlobalAveragePooling.

  • heads: The component for specific tasks.

  • losses: Loss functions.

  • peft: The PEFT (Parameter-Efficient Fine-Tuning) module, e.g. LoRAModel.

  • utils: Some helper functions and common components used in various networks.

Build Functions

build_classifier

Build classifier.

build_backbone

Build backbone.

build_neck

Build neck.

build_head

Build head.

build_loss

Build loss.

Classifiers

BaseClassifier

Base class for classifiers.

ImageClassifier

Image classifiers for supervised classification task.

TimmClassifier

Image classifiers for pytorch-image-models (timm) model.

HuggingFaceClassifier

Image classifiers for HuggingFace model.

Self-supervised Algorithms

BaseSelfSupervisor

BaseModel for Self-Supervised Learning.

BEiT

BEiT v1/v2.

BYOL

BYOL.

BarlowTwins

BarlowTwins.

CAE

CAE.

DenseCL

DenseCL.

EVA

EVA.

iTPN

iTPN.

MAE

MAE.

MILAN

MILAN.

MaskFeat

MaskFeat.

MixMIM

MixMIM.

MoCo

MoCo.

MoCoV3

MoCo v3.

SimCLR

SimCLR.

SimMIM

SimMIM.

SimSiam

SimSiam.

SparK

Implementation of SparK.

SwAV

SwAV.

Some of above algorithms modified the backbone module to adapt the extra inputs like mask, and here is the a list of these modified backbone modules.

BEiTPretrainViT

Vision Transformer for BEiT pre-training.

CAEPretrainViT

Vision Transformer for CAE pre-training and the implementation is based on BEiTViT.

iTPNHiViT

HiViT for iTPN pre-training.

MAEHiViT

HiViT for MAE pre-training.

MAEViT

Vision Transformer for MAE pre-training.

MILANViT

Vision Transformer for MILAN pre-training.

MaskFeatViT

Vision Transformer for MaskFeat pre-training.

MixMIMPretrainTransformer

MixMIM backbone for MixMIM pre-training.

MoCoV3ViT

Vision Transformer for MoCoV3 pre-training.

SimMIMSwinTransformer

Swin Transformer for SimMIM pre-training.

Some self-supervise algorithms need an external target generator to generate the optimization target. Here is a list of target generators.

VQKD

Vector-Quantized Knowledge Distillation.

DALLEEncoder

DALL-E Encoder for feature extraction.

HOGGenerator

Generate HOG feature for images.

CLIPGenerator

Get the features and attention from the last layer of CLIP.

Retrievers

BaseRetriever

Base class for retriever.

ImageToImageRetriever

Image To Image Retriever for supervised retrieval task.

Multi-Modality Algorithms

Blip2Caption

BLIP2 Caption.

Blip2Retrieval

BLIP2 Retriever.

Blip2VQA

BLIP2 VQA.

BlipCaption

BLIP Caption.

BlipGrounding

BLIP Grounding.

BlipNLVR

BLIP NLVR.

BlipRetrieval

BLIP Retriever.

BlipVQA

BLIP VQA.

Flamingo

The Open Flamingo model for multiple tasks.

OFA

The OFA model for multiple tasks.

MiniGPT4

The multi-modality model of MiniGPT-4.

Llava

The LLaVA model for multiple tasks.

Otter

The Otter model for multiple tasks.

Backbones

AlexNet

AlexNet backbone.

BEiTViT

Backbone for BEiT.

CSPDarkNet

CSP-Darknet backbone used in YOLOv4.

CSPNet

The abstract CSP Network class.

CSPResNeXt

CSP-ResNeXt backbone.

CSPResNet

CSP-ResNet backbone.

Conformer

Conformer backbone.

ConvMixer

ConvMixer.

ConvNeXt

ConvNeXt v1&v2 backbone.

DaViT

DaViT.

DeiT3

DeiT3 backbone.

DenseNet

DenseNet.

DistilledVisionTransformer

Distilled Vision Transformer.

EdgeNeXt

EdgeNeXt.

EfficientFormer

EfficientFormer.

EfficientNet

EfficientNet backbone.

EfficientNetV2

EfficientNetV2 backbone.

HiViT

HiViT.

HRNet

HRNet backbone.

HorNet

HorNet backbone.

InceptionV3

Inception V3 backbone.

LeNet5

LeNet5 backbone.

LeViT

LeViT backbone.

MViT

Multi-scale ViT v2.

MlpMixer

Mlp-Mixer backbone.

MobileNetV2

MobileNetV2 backbone.

MobileNetV3

MobileNetV3 backbone.

MobileOne

MobileOne backbone.

MobileViT

MobileViT backbone.

PCPVT

The backbone of Twins-PCPVT.

PoolFormer

PoolFormer.

PyramidVig

Pyramid Vision GNN backbone.

RegNet

RegNet backbone.

RepLKNet

RepLKNet backbone.

RepMLPNet

RepMLPNet backbone.

RepVGG

RepVGG backbone.

Res2Net

Res2Net backbone.

ResNeSt

ResNeSt backbone.

ResNeXt

ResNeXt backbone.

ResNet

ResNet backbone.

ResNetV1c

ResNetV1c backbone.

ResNetV1d

ResNetV1d backbone.

ResNet_CIFAR

ResNet backbone for CIFAR.

RevVisionTransformer

Reversible Vision Transformer.

SEResNeXt

SEResNeXt backbone.

SEResNet

SEResNet backbone.

SVT

The backbone of Twins-SVT.

ShuffleNetV1

ShuffleNetV1 backbone.

ShuffleNetV2

ShuffleNetV2 backbone.

SparseResNet

ResNet with sparse module conversion function.

SparseConvNeXt

ConvNeXt with sparse module conversion function.

SwinTransformer

Swin Transformer.

SwinTransformerV2

Swin Transformer V2.

T2T_ViT

Tokens-to-Token Vision Transformer (T2T-ViT)

TIMMBackbone

Wrapper to use backbones from timm library.

TNT

Transformer in Transformer.

VAN

Visual Attention Network.

VGG

VGG backbone.

Vig

Vision GNN backbone.

VisionTransformer

Vision Transformer.

ViTSAM

Vision Transformer as image encoder used in SAM.

XCiT

XCiT backbone.

ViTEVA02

EVA02 Vision Transformer.

Necks

BEiTV2Neck

Neck for BEiTV2 Pre-training.

CAENeck

Neck for CAE Pre-training.

ClsBatchNormNeck

Normalize cls token across batch before head.

DenseCLNeck

The non-linear neck of DenseCL.

GeneralizedMeanPooling

Generalized Mean Pooling neck.

GlobalAveragePooling

Global Average Pooling neck.

HRFuseScales

Fuse feature map of multiple scales in HRNet.

LinearNeck

Linear neck with Dimension projection.

MAEPretrainDecoder

Decoder for MAE Pre-training.

MILANPretrainDecoder

Prompt decoder for MILAN.

MixMIMPretrainDecoder

Decoder for MixMIM Pretraining.

MoCoV2Neck

The non-linear neck of MoCo v2: fc-relu-fc.

NonLinearNeck

The non-linear neck.

SimMIMLinearDecoder

Linear Decoder For SimMIM pretraining.

SwAVNeck

The non-linear neck of SwAV: fc-bn-relu-fc-normalization.

iTPNPretrainDecoder

The neck module of iTPN (transformer pyramid network).

SparKLightDecoder

The decoder for SparK, which upsamples the feature maps.

Heads

ArcFaceClsHead

ArcFace classifier head.

BEiTV1Head

Head for BEiT v1 Pre-training.

BEiTV2Head

Head for BEiT v2 Pre-training.

CAEHead

Head for CAE Pre-training.

CSRAClsHead

Class-specific residual attention classifier head.

ClsHead

Classification head.

ConformerHead

Linear classifier head.

ContrastiveHead

Head for contrastive learning.

DeiTClsHead

Distilled Vision Transformer classifier head.

EfficientFormerClsHead

EfficientFormer classifier head.

LatentCrossCorrelationHead

Head for latent feature cross correlation.

LatentPredictHead

Head for latent feature prediction.

LeViTClsHead

LinearClsHead

Linear classifier head.

MAEPretrainHead

Head for MAE Pre-training.

MIMHead

Pre-training head for Masked Image Modeling.

MixMIMPretrainHead

Head for MixMIM Pre-training.

MoCoV3Head

Head for MoCo v3 Pre-training.

MultiLabelClsHead

Classification head for multilabel task.

MultiLabelLinearClsHead

Linear classification head for multilabel task.

MultiTaskHead

Multi task head.

SimMIMHead

Head for SimMIM Pre-training.

StackedLinearClsHead

Classifier head with several hidden fc layer and a output fc layer.

SwAVHead

Head for SwAV Pre-training.

VigClsHead

The classification head for Vision GNN.

VisionTransformerClsHead

Vision Transformer classifier head.

iTPNClipHead

Head for iTPN Pre-training using Clip.

SparKPretrainHead

Pre-training head for SparK.

Losses

AsymmetricLoss

asymmetric loss.

CAELoss

Loss function for CAE.

CosineSimilarityLoss

Cosine similarity loss function.

CrossCorrelationLoss

Cross correlation loss function.

CrossEntropyLoss

Cross entropy loss.

FocalLoss

Focal loss.

LabelSmoothLoss

Initializer for the label smoothed cross entropy loss.

PixelReconstructionLoss

Loss for the reconstruction of pixel in Masked Image Modeling.

SeesawLoss

Implementation of seesaw loss.

SwAVLoss

The Loss for SwAV.

PEFT

LoRAModel

Implements LoRA in a module.

models.utils

This package includes some helper functions and common components used in various networks.

Common Components

ConditionalPositionEncoding

The Conditional Position Encoding (CPE) module.

CosineEMA

CosineEMA is implemented for updating momentum parameter, used in BYOL, MoCoV3, etc.

HybridEmbed

CNN Feature Map Embedding.

InvertedResidual

Inverted Residual Block.

LayerScale

LayerScale layer.

MultiheadAttention

Multi-head Attention Module.

PatchEmbed

Image to Patch Embedding.

PatchMerging

Merge patch feature map.

SELayer

Squeeze-and-Excitation Module.

ShiftWindowMSA

Shift Window Multihead Self-Attention Module.

WindowMSA

Window based multi-head self-attention (W-MSA) module with relative position bias.

WindowMSAV2

Window based multi-head self-attention (W-MSA) module with relative position bias.

Helper Functions

channel_shuffle

Channel Shuffle operation.

is_tracing

Determine whether the model is called during the tracing of code with torch.jit.trace.

make_divisible

Make divisible function.

resize_pos_embed

Resize pos_embed weights.

resize_relative_position_bias_table

Resize relative position bias table.

to_ntuple

A to_tuple function generator.