Shortcuts

Data Process

In MMPreTrain, the data process and the dataset is decomposed. The datasets only define how to get samples’ basic information from the file system. These basic information includes the ground-truth label and raw images data / the paths of images.The data process includes data transforms, data preprocessors and batch augmentations.

  • Data Transforms: Transforms includes loading, preprocessing, formatting and etc.

  • Data Preprocessors: Processes includes collate, normalization, stacking, channel fliping and etc.

Data Transforms

To prepare the inputs data, we need to do some transforms on these basic information. These transforms includes loading, preprocessing and formatting. And a series of data transforms makes up a data pipeline. Therefore, you can find the a pipeline argument in the configs of dataset, for example:

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='RandomResizedCrop', scale=224),
    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
    dict(type='PackInputs'),
]

train_dataloader = dict(
    ....
    dataset=dict(
        pipeline=train_pipeline,
        ....),
    ....
)

Every item of a pipeline list is one of the following data transforms class. And if you want to add a custom data transformation class, the tutorial Custom Data Pipelines will help you.

Loading and Formatting

LoadImageFromFile

Load an image from file.

PackInputs

Pack the inputs data.

PackMultiTaskInputs

Convert all image labels of multi-task dataset to a dict of tensor.

PILToNumpy

Convert img to numpy.ndarray.

NumpyToPIL

Convert the image from OpenCV format to PIL.Image.Image.

Transpose

Transpose numpy array.

Collect

Collect and only reserve the specified fields.

Processing and Augmentation

Albumentations

Wrapper to use augmentation from albumentations library.

CenterCrop

Crop the center of the image, segmentation masks, bounding boxes and key points.

ColorJitter

Randomly change the brightness, contrast and saturation of an image.

EfficientNetCenterCrop

EfficientNet style center crop.

EfficientNetRandomCrop

EfficientNet style RandomResizedCrop.

Lighting

Adjust images lighting using AlexNet-style PCA jitter.

Normalize

Normalize the image.

RandomCrop

Crop the given Image at a random location.

RandomErasing

Randomly selects a rectangle region in an image and erase pixels.

RandomFlip

Flip the image & bbox & keypoints & segmentation map.

RandomGrayscale

Randomly convert image to grayscale with a probability.

RandomResize

Random resize images & bbox & keypoints.

RandomResizedCrop

Crop the given image to random scale and aspect ratio.

Resize

Resize images & bbox & seg & keypoints.

ResizeEdge

Resize images along the specified edge.

BEiTMaskGenerator

Generate mask for image.

SimMIMMaskGenerator

Generate random block mask for each Image.

Composed Augmentation

Composed augmentation is a kind of methods which compose a series of data augmentation transforms, such as AutoAugment and RandAugment.

AutoAugment

Auto augmentation.

RandAugment

Random augmentation.

The above transforms is composed from a group of policies from the below random transforms:

AutoContrast

Auto adjust image contrast.

Brightness

Adjust images brightness.

ColorTransform

Adjust images color balance.

Contrast

Adjust images contrast.

Cutout

Cutout images.

Equalize

Equalize the image histogram.

GaussianBlur

Gaussian blur images.

Invert

Invert images.

Posterize

Posterize images (reduce the number of bits for each color channel).

Rotate

Rotate images.

Sharpness

Adjust images sharpness.

Shear

Shear images.

Solarize

Solarize images (invert all pixel values above a threshold).

SolarizeAdd

SolarizeAdd images (add a certain value to pixels below a threshold).

Translate

Translate images.

BaseAugTransform

The base class of augmentation transform for RandAugment.

MMCV transforms

We also provides many transforms in MMCV. You can use them directly in the config files. Here are some frequently used transforms, and the whole transforms list can be found in mmcv.transforms.

Transform Wrapper

MultiView

A transform wrapper for multiple views of an image.

TorchVision Transforms

We also provide all the transforms in TorchVision. You can use them the like following examples:

1. Use some TorchVision Augs Surrounded by NumpyToPIL and PILToNumpy (Recommendation)

Add TorchVision Augs surrounded by dict(type='NumpyToPIL', to_rgb=True), and dict(type='PILToNumpy', to_bgr=True),

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='NumpyToPIL', to_rgb=True),     # from BGR in cv2 to RGB  in PIL
    dict(type='torchvision/RandomResizedCrop',size=176),
    dict(type='PILToNumpy', to_bgr=True),     # from RGB  in PIL to BGR in cv2
    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
    dict(type='PackInputs'),
]

data_preprocessor = dict(
    num_classes=1000,
    mean=[123.675, 116.28, 103.53],
    std=[58.395, 57.12, 57.375],
    to_rgb=True,                          # from BGR in cv2 to RGB  in PIL
)

2. Use TorchVision Augs and ToTensor&Normalize

Make sure the ‘img’ has been converted to PIL format from BGR-Numpy format before being processed by TorchVision Augs.

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='NumpyToPIL', to_rgb=True),       # from BGR in cv2 to RGB  in PIL
    dict(
        type='torchvision/RandomResizedCrop',
        size=176,
        interpolation='bilinear'),            # accept str format interpolation mode
    dict(type='torchvision/RandomHorizontalFlip', p=0.5),
    dict(
        type='torchvision/TrivialAugmentWide',
        interpolation='bilinear'),
    dict(type='torchvision/PILToTensor'),
    dict(type='torchvision/ConvertImageDtype', dtype=torch.float),
    dict(
        type='torchvision/Normalize',
        mean=(0.485, 0.456, 0.406),
        std=(0.229, 0.224, 0.225),
    ),
    dict(type='torchvision/RandomErasing', p=0.1),
    dict(type='PackInputs'),
]

data_preprocessor = dict(num_classes=1000, mean=None, std=None, to_rgb=False)  # Normalize in dataset pipeline

3. Use TorchVision Augs Except ToTensor&Normalize

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='NumpyToPIL', to_rgb=True),   # from BGR in cv2 to RGB  in PIL
    dict(type='torchvision/RandomResizedCrop', size=176, interpolation='bilinear'),
    dict(type='torchvision/RandomHorizontalFlip', p=0.5),
    dict(type='torchvision/TrivialAugmentWide', interpolation='bilinear'),
    dict(type='PackInputs'),
]

# here the Normalize params is for the RGB format
data_preprocessor = dict(
    num_classes=1000,
    mean=[123.675, 116.28, 103.53],
    std=[58.395, 57.12, 57.375],
    to_rgb=False,
)

Data Preprocessors

The data preprocessor is also a component to process the data before feeding data to the neural network. Comparing with the data transforms, the data preprocessor is a module of the classifier, and it takes a batch of data to process, which means it can use GPU and batch to accelebrate the processing.

The default data preprocessor in MMPreTrain could do the pre-processing like following:

  1. Move data to the target device.

  2. Pad inputs to the maximum size of current batch.

  3. Stack inputs to a batch.

  4. Convert inputs from bgr to rgb if the shape of input is (3, H, W).

  5. Normalize image with defined std and mean.

  6. Do batch augmentations like Mixup and CutMix during training.

You can configure the data preprocessor by the data_preprocessor field or model.data_preprocessor field in the config file. Typical usages are as below:

data_preprocessor = dict(
    # RGB format normalization parameters
    mean=[123.675, 116.28, 103.53],
    std=[58.395, 57.12, 57.375],
    to_rgb=True,    # convert image from BGR to RGB
)

Or define in model.data_preprocessor as following:

model = dict(
    backbone = ...,
    neck = ...,
    head = ...,
    data_preprocessor = dict(
                         mean=[123.675, 116.28, 103.53],
                         std=[58.395, 57.12, 57.375],
                         to_rgb=True)
    train_cfg=...,
)

Note that the model.data_preprocessor has higher priority than data_preprocessor.

ClsDataPreprocessor

Image pre-processor for classification tasks.

SelfSupDataPreprocessor

Image pre-processor for operations, like normalization and bgr to rgb.

TwoNormDataPreprocessor

Image pre-processor for CAE, BEiT v1/v2, etc.

VideoDataPreprocessor

Video pre-processor for operations, like normalization and bgr to rgb conversion .

Batch Augmentations

The batch augmentation is a component of data preprocessors. It involves multiple samples and mix them in some way, such as Mixup and CutMix.

These augmentations are usually only used during training, therefore, we use the model.train_cfg field to configure them in config files.

model = dict(
    backbone=...,
    neck=...,
    head=...,
    train_cfg=dict(augments=[
        dict(type='Mixup', alpha=0.8),
        dict(type='CutMix', alpha=1.0),
    ]),
)

You can also specify the probabilities of every batch augmentation by the probs field.

model = dict(
    backbone=...,
    neck=...,
    head=...,
    train_cfg=dict(augments=[
        dict(type='Mixup', alpha=0.8),
        dict(type='CutMix', alpha=1.0),
    ], probs=[0.3, 0.7])
)

Here is a list of batch augmentations can be used in MMPreTrain.

Mixup

Mixup batch augmentation.

CutMix

CutMix batch agumentation.

ResizeMix

ResizeMix Random Paste layer for a batch of data.

Read the Docs v: latest
Versions
latest
stable
mmcls-1.x
mmcls-0.x
dev
Downloads
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.