Shortcuts

Blip2Caption

class mmpretrain.models.multimodal.Blip2Caption(vision_backbone, text_backbone, multimodal_backbone, vision_neck, tokenizer=None, prompt='', max_txt_len=20, num_captions=1, data_preprocessor=None, init_cfg=None)[源代码]

BLIP2 Caption.

Module for BLIP2 Caption task.

参数:
  • vision_backbone (dict) – The config dict for vision backbone.

  • text_backbone (dict) – The config dict for text backbone.

  • multimodal_backbone (dict) – The config dict for multimodal backbone.

  • vision_neck (dict) – The config dict for vision neck.

  • tokenizer – (Optional[dict]): The config for tokenizer. Defaults to None.

  • prompt (str) – Prompt used for training and eval. Defaults to ‘’.

  • max_txt_len (int) – Max text length of input text.

  • num_captions (int) – Number of captions to be generated for each image.

  • data_preprocessor (Optional[dict]) – The config for preprocessing input data. If None or no specified type, it will use “MultiModalDataPreprocessor” as type. See MultiModalDataPreprocessor for more details. Defaults to None.

  • init_cfg (Optional[dict]) – the config to control the initialization. Defaults to None.

forward(images, data_samples=None, mode='loss')[源代码]

The unified entry for a forward process in both training and test. The method should accept two modes: “predict” and “loss”:

  • “predict”: Forward and return the predictions, which are fully processed to a list of DataSample.

  • “loss”: Forward and return a dict of losses according to the given inputs and data samples.

Note that this method doesn’t handle neither back propagation nor optimizer updating, which are done in the train_step().

参数:
  • images (torch.Tensor) – pre_processed img tensor (N, C, …).

  • data_samples (List[DataSample], optional) –

  • mode (str) – Return what kind of value. Defaults to ‘loss’.

返回:

The return type depends on mode. - If mode="loss", return a dict of tensor. - If mode="predict", return a list of

loss(images, data_samples=None, **kwargs)[源代码]

The forward function in training.

参数:
  • images (torch.Tensor) – The input tensor with shape (N, C, …) in general.

  • data_samples (List[DataSample], optional) – The annotation data of every samples. Defaults to None.

  • **kwargs – Other keyword arguments accepted by the loss method of head.

返回:

A dictionary of loss components.

返回类型:

Dict[str, torch.Tensor]

predict(images, data_samples=None, **kwargs)[源代码]

Predict captions from a batch of inputs.

参数:
  • images (torch.Tensor) – The input tensor with shape (N, C, …) in general.

  • data_samples (List[DataSample], optional) – The annotation data of every samples. Defaults to None.

  • **kwargs – Other keyword arguments accepted by the predict method of head.

返回:

Return list of data samples.

返回类型:

List[DataSample]

Read the Docs v: latest
Versions
latest
stable
mmcls-1.x
mmcls-0.x
dev
Downloads
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.