Shortcuts

BlipNLVR

class mmpretrain.models.multimodal.BlipNLVR(vision_backbone, multimodal_backbone, tokenizer=None, max_txt_len=35, data_preprocessor=None, init_cfg=None)[源代码]

BLIP NLVR.

参数:
  • vision_backbone (dict) – Backbone for extracting image features.

  • text_backbone (dict) – Backbone for extracting text features. but we integrate the vqa text extractor into the tokenizer part in datasets/transform/ so we don’t need text_backbone

  • multimodal_backbone (Optional[dict]) – Backbone for extracting multi-modal features. We apply this part as VQA fusion module.

  • neck (Optional[dict]) – The neck module to process features from backbone. Defaults to None.

  • head (Optional[dict]) – The head module to calculate loss from processed features. See mmmultimodal.models.heads. Notice that if the head is not set, loss method cannot be used. Defaults to None.

  • tokenizer – (Optional[dict]): The config for tokenizer

  • data_preprocessor (Optional[dict]) – The config for preprocessing input data. If None or no specified type, it will use “MutimodalDataPreprocessor” as type. See MutimodalDataPreprocessor for more details. Defaults to None.

  • init_cfg (Optional[dict]) – the config to control the initialization. Defaults to None.

forward(images, data_samples=None, mode='tensor')[源代码]

The unified entry for a forward process in both training and test. The method should accept only one mode “loss”:

  • “loss”: Forward and return a dict of losses according to the given images and data samples.

Note that this method doesn’t handle neither back propagation nor optimizer updating, which are done in the train_step().

参数:
  • images (dict of torch.Tensor) – img: pre_processed img tensor (N, C, …). text: tokenized text (N, L)

  • data_samples (List[CaptionDataSample], optional) –

  • samples. (The annotation data of every) – ‘image’: raw image data ‘text’ tokenized text

  • mode (str) – Return what kind of value. Defaults to ‘tensor’.

返回:

The return type depends on mode. - If mode="loss", return a dict of tensor.

loss(images, data_samples)[源代码]

Calculate losses from a batch of inputs and data samples.

参数:
  • images (torch.Tensor) – The input tensor with shape (N, C, …) in general.

  • data_samples (List[ImageTextDataSample]) – The annotation data of every samples.

返回:

a dictionary of loss components.

返回类型:

dict[str, Tensor]

predict(images, data_samples=None)[源代码]

Predict caption.

Read the Docs v: latest
Versions
latest
stable
mmcls-1.x
mmcls-0.x
dev
Downloads
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.