Shortcuts

Blip2VQA

class mmpretrain.models.multimodal.Blip2VQA(vision_backbone, text_backbone, multimodal_backbone, vision_neck, tokenizer=None, prompt='', max_txt_len=20, num_captions=1, data_preprocessor=None, init_cfg=None)[源代码]

BLIP2 VQA.

Module for BLIP2 VQA task. For more details about the initialization params, please refer to Blip2Caption.

predict(images, data_samples=None, **kwargs)[源代码]

Predict captions from a batch of inputs.

参数:
  • images (torch.Tensor) – The input tensor with shape (N, C, …) in general.

  • data_samples (List[DataSample], optional) – The annotation data of every samples. Defaults to None.

  • **kwargs – Other keyword arguments accepted by the predict method of head.

返回:

Return list of data samples.

返回类型:

List[DataSample]