Shortcuts

Blip2VQA

class mmpretrain.models.multimodal.Blip2VQA(vision_backbone, text_backbone, multimodal_backbone, vision_neck, tokenizer=None, prompt='', max_txt_len=20, num_captions=1, data_preprocessor=None, init_cfg=None)[source]

BLIP2 VQA.

Module for BLIP2 VQA task. For more details about the initialization params, please refer to Blip2Caption.

predict(images, data_samples=None, **kwargs)[source]

Predict captions from a batch of inputs.

Parameters:
  • images (torch.Tensor) – The input tensor with shape (N, C, …) in general.

  • data_samples (List[DataSample], optional) – The annotation data of every samples. Defaults to None.

  • **kwargs – Other keyword arguments accepted by the predict method of head.

Returns:

Return list of data samples.

Return type:

List[DataSample]

Read the Docs v: latest
Versions
latest
stable
mmcls-1.x
mmcls-0.x
dev
Downloads
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.