mmpretrain.visualization¶

This package includes visualizer and some helper functions for visualization.

Visualizer¶

class mmpretrain.visualization.UniversalVisualizer(name='visualizer', image=None, vis_backends=None, save_dir=None, fig_save_cfg={'frameon': False}, fig_show_cfg={'frameon': False})[source]¶

Universal Visualizer for multiple tasks.

Parameters:

name (str) – Name of the instance. Defaults to ‘visualizer’.
image (np.ndarray, optional) – the origin image to draw. The format should be RGB. Defaults to None.
vis_backends (list, optional) – Visual backend config list. Defaults to None.
save_dir (str, optional) – Save file dir for all storage backends. If it is None, the backend storage will not save any data.
fig_save_cfg (dict) – Keyword parameters of figure for saving. Defaults to empty dict.
fig_show_cfg (dict) – Keyword parameters of figure for showing. Defaults to empty dict.

visualize_cls(image, data_sample, classes=None, draw_gt=True, draw_pred=True, draw_score=True, resize=None, rescale_factor=None, text_cfg={}, show=False, wait_time=0, out_file=None, name='', step=0)[source]¶

Visualize image classification result.

This method will draw an text box on the input image to visualize the information about image classification, like the ground-truth label and prediction label.

Parameters:

image (np.ndarray) – The image to draw. The format should be RGB.
data_sample (DataSample) – The annotation of the image.
classes (Sequence[str], optional) – The categories names. Defaults to None.
draw_gt (bool) – Whether to draw ground-truth labels. Defaults to True.
draw_pred (bool) – Whether to draw prediction labels. Defaults to True.
draw_score (bool) – Whether to draw the prediction scores of prediction categories. Defaults to True.
resize (int, optional) – Resize the short edge of the image to the specified length before visualization. Defaults to None.
rescale_factor (float, optional) – Rescale the image by the rescale factor before visualization. Defaults to None.
text_cfg (dict) – Extra text setting, which accepts arguments of mmengine.Visualizer.draw_texts(). Defaults to an empty dict.
show (bool) – Whether to display the drawn image in a window, please confirm your are able to access the graphical interface. Defaults to False.
wait_time (float) – The display time (s). Defaults to 0, which means “forever”.
out_file (str, optional) – Extra path to save the visualization result. If specified, the visualizer will only save the result image to the out_file and ignore its storage backends. Defaults to None.
name (str) – The image identifier. It’s useful when using the storage backends of the visualizer to save or display the image. Defaults to an empty string.
step (int) – The global step value. It’s useful to record a series of visualization results for the same image with the storage backends. Defaults to 0.

Returns:

The visualization image.

Return type:

np.ndarray

visualize_i2t_retrieval(image, data_sample, prototype_dataset, topk=1, draw_score=True, resize=None, text_cfg={}, show=False, wait_time=0, out_file=None, name='', step=0)[source]¶

Visualize Image-To-Text retrieval result.

This method will draw the input image and the texts retrieved from the prototype dataset.

Parameters:

image (np.ndarray) – The image to draw. The format should be RGB.
data_sample (DataSample) – The annotation of the image.
prototype_dataset (Sequence[str]) – The prototype dataset. It should be a list of texts.
topk (int) – To visualize the topk matching items. Defaults to 1.
draw_score (bool) – Whether to draw the prediction scores of prediction categories. Defaults to True.
resize (int, optional) – Resize the short edge of the image to the specified length before visualization. Defaults to None.
text_cfg (dict) – Extra text setting, which accepts arguments of mmengine.Visualizer.draw_texts(). Defaults to an empty dict.
show (bool) – Whether to display the drawn image in a window, please confirm your are able to access the graphical interface. Defaults to False.
wait_time (float) – The display time (s). Defaults to 0, which means “forever”.
out_file (str, optional) – Extra path to save the visualization result. If specified, the visualizer will only save the result image to the out_file and ignore its storage backends. Defaults to None.
name (str) – The image identifier. It’s useful when using the storage backends of the visualizer to save or display the image. Defaults to an empty string.
step (int) – The global step value. It’s useful to record a series of visualization results for the same image with the storage backends. Defaults to 0.

Returns:

The visualization image.

Return type:

np.ndarray

visualize_image_caption(image, data_sample, resize=None, text_cfg={}, show=False, wait_time=0, out_file=None, name='', step=0)[source]¶

Visualize image caption result.

This method will draw the input image and the images caption.

Parameters:

image (np.ndarray) – The image to draw. The format should be RGB.
data_sample (DataSample) – The annotation of the image.
resize (int, optional) – Resize the long edge of the image to the specified length before visualization. Defaults to None.
text_cfg (dict) – Extra text setting, which accepts arguments of plt.text(). Defaults to an empty dict.
show (bool) – Whether to display the drawn image in a window, please confirm your are able to access the graphical interface. Defaults to False.
wait_time (float) – The display time (s). Defaults to 0, which means “forever”.
out_file (str, optional) – Extra path to save the visualization result. If specified, the visualizer will only save the result image to the out_file and ignore its storage backends. Defaults to None.
name (str) – The image identifier. It’s useful when using the storage backends of the visualizer to save or display the image. Defaults to an empty string.
step (int) – The global step value. It’s useful to record a series of visualization results for the same image with the storage backends. Defaults to 0.

Returns:

The visualization image.

Return type:

np.ndarray

visualize_image_retrieval(image, data_sample, prototype_dataset, topk=1, draw_score=True, resize=None, text_cfg={}, show=False, wait_time=0, out_file=None, name='', step=0)[source]¶

Visualize image retrieval result.

This method will draw the input image and the images retrieved from the prototype dataset.

Parameters:

image (np.ndarray) – The image to draw. The format should be RGB.
data_sample (DataSample) – The annotation of the image.
prototype_dataset (BaseDataset) – The prototype dataset. It should have get_data_info method and return a dict includes img_path.
draw_score (bool) – Whether to draw the match scores of the retrieved images. Defaults to True.
resize (int, optional) – Resize the long edge of the image to the specified length before visualization. Defaults to None.
text_cfg (dict) – Extra text setting, which accepts arguments of plt.text(). Defaults to an empty dict.
show (bool) – Whether to display the drawn image in a window, please confirm your are able to access the graphical interface. Defaults to False.
wait_time (float) – The display time (s). Defaults to 0, which means “forever”.
out_file (str, optional) – Extra path to save the visualization result. If specified, the visualizer will only save the result image to the out_file and ignore its storage backends. Defaults to None.
name (str) – The image identifier. It’s useful when using the storage backends of the visualizer to save or display the image. Defaults to an empty string.
step (int) – The global step value. It’s useful to record a series of visualization results for the same image with the storage backends. Defaults to 0.

Returns:

The visualization image.

Return type:

np.ndarray

visualize_masked_image(image, data_sample, resize=224, color='black', alpha=0.8, show=False, wait_time=0, out_file=None, name='', step=0)[source]¶

Visualize masked image.

This method will draw an image with binary mask.

Parameters:

image (np.ndarray) – The image to draw. The format should be RGB.
data_sample (DataSample) – The annotation of the image.
resize (int | Tuple[int]) – Resize the input image to the specified shape. Defaults to 224.
color (str | Tuple[int]) – The color of the binary mask. Defaults to “black”.
alpha (int | float) – The transparency of the mask. Defaults to 0.8.
show (bool) – Whether to display the drawn image in a window, please confirm your are able to access the graphical interface. Defaults to False.
wait_time (float) – The display time (s). Defaults to 0, which means “forever”.
out_file (str, optional) – Extra path to save the visualization result. If specified, the visualizer will only save the result image to the out_file and ignore its storage backends. Defaults to None.
name (str) – The image identifier. It’s useful when using the storage backends of the visualizer to save or display the image. Defaults to an empty string.
step (int) – The global step value. It’s useful to record a series of visualization results for the same image with the storage backends. Defaults to 0.

Returns:

The visualization image.

Return type:

np.ndarray

visualize_t2i_retrieval(text, data_sample, prototype_dataset, topk=1, draw_score=True, text_cfg={}, fig_cfg={}, show=False, wait_time=0, out_file=None, name='', step=0)[source]¶

Visualize Text-To-Image retrieval result.

This method will draw the input text and the images retrieved from the prototype dataset.

Parameters:

image (np.ndarray) – The image to draw. The format should be RGB.
data_sample (DataSample) – The annotation of the image.
prototype_dataset (BaseDataset) – The prototype dataset. It should have get_data_info method and return a dict includes img_path.
topk (int) – To visualize the topk matching items. Defaults to 1.
draw_score (bool) – Whether to draw the match scores of the retrieved images. Defaults to True.
text_cfg (dict) – Extra text setting, which accepts arguments of plt.text(). Defaults to an empty dict.
fig_cfg (dict) – Extra figure setting, which accepts arguments of plt.Figure(). Defaults to an empty dict.
show (bool) – Whether to display the drawn image in a window, please confirm your are able to access the graphical interface. Defaults to False.
wait_time (float) – The display time (s). Defaults to 0, which means “forever”.
out_file (str, optional) – Extra path to save the visualization result. If specified, the visualizer will only save the result image to the out_file and ignore its storage backends. Defaults to None.
name (str) – The image identifier. It’s useful when using the storage backends of the visualizer to save or display the image. Defaults to an empty string.
step (int) – The global step value. It’s useful to record a series of visualization results for the same image with the storage backends. Defaults to 0.

Returns:

The visualization image.

Return type:

np.ndarray

visualize_visual_grounding(image, data_sample, resize=None, text_cfg={}, show=False, wait_time=0, out_file=None, name='', line_width=3, bbox_color='green', step=0)[source]¶

Visualize visual grounding result.

This method will draw the input image, bbox and the object.

Parameters:

image (np.ndarray) – The image to draw. The format should be RGB.
data_sample (DataSample) – The annotation of the image.
resize (int, optional) – Resize the long edge of the image to the specified length before visualization. Defaults to None.
text_cfg (dict) – Extra text setting, which accepts arguments of plt.text(). Defaults to an empty dict.
show (bool) – Whether to display the drawn image in a window, please confirm your are able to access the graphical interface. Defaults to False.
wait_time (float) – The display time (s). Defaults to 0, which means “forever”.
out_file (str, optional) – Extra path to save the visualization result. If specified, the visualizer will only save the result image to the out_file and ignore its storage backends. Defaults to None.
name (str) – The image identifier. It’s useful when using the storage backends of the visualizer to save or display the image. Defaults to an empty string.
step (int) – The global step value. It’s useful to record a series of visualization results for the same image with the storage backends. Defaults to 0.

Returns:

The visualization image.

Return type:

np.ndarray

visualize_vqa(image, data_sample, resize=None, text_cfg={}, show=False, wait_time=0, out_file=None, name='', step=0)[source]¶

Visualize visual question answering result.

This method will draw the input image, question and answer.

Parameters:

image (np.ndarray) – The image to draw. The format should be RGB.
data_sample (DataSample) – The annotation of the image.
resize (int, optional) – Resize the long edge of the image to the specified length before visualization. Defaults to None.
text_cfg (dict) – Extra text setting, which accepts arguments of plt.text(). Defaults to an empty dict.
show (bool) – Whether to display the drawn image in a window, please confirm your are able to access the graphical interface. Defaults to False.
wait_time (float) – The display time (s). Defaults to 0, which means “forever”.
out_file (str, optional) – Extra path to save the visualization result. If specified, the visualizer will only save the result image to the out_file and ignore its storage backends. Defaults to None.
name (str) – The image identifier. It’s useful when using the storage backends of the visualizer to save or display the image. Defaults to an empty string.
step (int) – The global step value. It’s useful to record a series of visualization results for the same image with the storage backends. Defaults to 0.

Returns:

The visualization image.

Return type:

np.ndarray