Blip2VQA¶
- class mmpretrain.models.multimodal.Blip2VQA(vision_backbone, text_backbone, multimodal_backbone, vision_neck, tokenizer=None, prompt='', max_txt_len=20, num_captions=1, data_preprocessor=None, init_cfg=None)[源代码]¶
BLIP2 VQA.
Module for BLIP2 VQA task. For more details about the initialization params, please refer to
Blip2Caption
.- predict(images, data_samples=None, **kwargs)[源代码]¶
Predict captions from a batch of inputs.
- 参数:
images (torch.Tensor) – The input tensor with shape (N, C, …) in general.
data_samples (List[DataSample], optional) – The annotation data of every samples. Defaults to None.
**kwargs – Other keyword arguments accepted by the
predict
method ofhead
.
- 返回:
Return list of data samples.
- 返回类型:
List[DataSample]