Table of Contents

Shortcuts

Blip2VQA¶

class mmpretrain.models.multimodal.Blip2VQA(vision_backbone, text_backbone, multimodal_backbone, vision_neck, tokenizer=None, prompt='', max_txt_len=20, num_captions=1, data_preprocessor=None, init_cfg=None)[源代码]¶

BLIP2 VQA.

Module for BLIP2 VQA task. For more details about the initialization params, please refer to Blip2Caption.

predict(images, data_samples=None, **kwargs)[源代码]¶

Predict captions from a batch of inputs.

参数:

images (torch.Tensor) – The input tensor with shape (N, C, …) in general.
data_samples (List[DataSample], optional) – The annotation data of every samples. Defaults to None.
**kwargs – Other keyword arguments accepted by the predict method of head.

返回:

Return list of data samples.

返回类型:

List[DataSample]

Read the Docs v: latest

Versions: latest; stable; mmcls-1.x; mmcls-0.x; dev

Downloads: epub

On Read the Docs: Project Home; Builds

Free document hosting provided by Read the Docs.