Shortcuts

Otter

class mmpretrain.models.multimodal.Otter(vision_encoder, lang_encoder, tokenizer, task='caption', zeroshot_prompt='', shot_prompt_tmpl='<image>User:Please describe the image. GPT:<answer>{caption}<|endofchunk|>', final_prompt_tmpl='<image>User:Please describe the image. GPT:<answer>', generation_cfg={}, data_preprocessor=None, init_cfg=None)[source]

The Otter model for multiple tasks.

Parameters:
  • vision_encoder (dict) – The config of the vision encoder.

  • lang_encoder (dict) – The config of the language encoder.

  • tokenizer (dict) – The tokenizer to encode the text.

  • task (int) – The task to perform prediction.

  • zeroshot_prompt (str) – Prompt used for zero-shot inference. Defaults to an.

  • shot_prompt_tmpl (str) – Prompt used for few-shot inference. Defaults to <image>User:Please describe the image. GPT:<answer>{caption}<|endofchunk|>.

  • final_prompt_tmpl (str) – Final part of prompt used for inference. Defaults to ‘<image>User:Please describe the image. GPT:<answer>’.

  • generation_cfg (dict) – The extra generation config, accept the keyword arguments of [~`transformers.GenerationConfig`]. Defaults to an empty dict.

  • data_preprocessor (Optional[dict]) – The config for preprocessing input data. If None or no specified type, it will use “MutimodalDataPreprocessor” as type. See MutimodalDataPreprocessor for more details. Defaults to None.

  • init_cfg (dict, optional) – The initialization config. Defaults to None.

post_process(outputs, data_samples)[source]

Perform post process for outputs for different task.

Parameters:
  • outputs (torch.Tensor) – The generated outputs.

  • data_samples (List[DataSample], optional) – The annotation data of every samples.

Returns:

Return list of data samples.

Return type:

List[DataSample]

Read the Docs v: latest
Versions
latest
stable
mmcls-1.x
mmcls-0.x
dev
Downloads
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.