Shortcuts

CLIPGenerator

class mmpretrain.models.selfsup.CLIPGenerator(tokenizer_path)[source]

Get the features and attention from the last layer of CLIP.

This module is used to generate target features in masked image modeling.

Parameters:

tokenizer_path (str) – The path of the checkpoint of CLIP.

forward(x)[source]

Get the features and attention from the last layer of CLIP.

Parameters:

x (torch.Tensor) – The input image, which is of shape (N, 3, H, W).

Returns:

The features and attention from the last layer of CLIP, which are of shape (N, L, C) and (N, L, L), respectively.

Return type:

Tuple[torch.Tensor, torch.Tensor]