数据处理¶

In MMPreTrain, the data process and the dataset is decomposed. The datasets only define how to get samples’ basic information from the file system. These basic information includes the ground-truth label and raw images data / the paths of images.The data process includes data transforms, data preprocessors and batch augmentations.

Data Transforms: Transforms includes loading, preprocessing, formatting and etc.
Data Preprocessors: Processes includes collate, normalization, stacking, channel fliping and etc.
- Batch Augmentations: Batch augmentation involves multiple samples, such as Mixup and CutMix.

数据变换¶

为了准备输入数据，我们需要对数据集中保存的基本信息做一些变换。这些变换包括数据加载、部分预处理和增强、格式化。一系列的数据变换组成了数据流水线（data pipeline）。因此，在数据集的配置参数中通常存在一个 pipeline 参数，例如：

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='RandomResizedCrop', scale=224),
    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
    dict(type='PackInputs'),
]

train_dataloader = dict(
    ....
    dataset=dict(
        pipeline=train_pipeline,
        ....),
    ....
)

pipeline 列表中的每一项都是以下数据变换类之一。如果您想添加自定义数据变换类，可以参考自定义数据流水线教程。

Loading and Formatting ¶

`LoadImageFromFile`	从图片路径加载图片
`PackInputs`	Pack the inputs data.
`PackMultiTaskInputs`	Convert all image labels of multi-task dataset to a dict of tensor.
`PILToNumpy`	Convert img to `numpy.ndarray`.
`NumpyToPIL`	将图片从 OpenCV 格式转为为 `PIL.Image.Image` 格式
`Transpose`	转置 NumPy 数组
`Collect`	收集并仅保留指定字段的数据

组合式增强 ¶

`Albumentations`	使用 Albumentations 库进行数据变换的封装类
`CenterCrop`	Crop the center of the image, segmentation masks, bounding boxes and key points.
`ColorJitter`	随机改变图像的亮度、对比度和饱和度
`EfficientNetCenterCrop`	EfficientNet 风格的中心裁剪
`EfficientNetRandomCrop`	EfficientNet 风格的随机缩放裁剪
`Lighting`	使用 AlexNet 风格的 PCA 抖动随机调整图像照明
`Normalize`	归一化图像
`RandomCrop`	在随机位置裁剪给定图像
`RandomErasing`	在图像中随机选择一个矩形区域并擦除像素
`RandomFlip`	随机翻转图像、bbox、关键点等
`RandomGrayscale`	随机灰度化图像
`RandomResize`	随机缩放图像、bbox、关键点等
`RandomResizedCrop`	将给定图像按照随机尺寸和纵横比进行裁剪
`Resize`	缩放图像、bbox、分割图、关键点等
`ResizeEdge`	按照指定边长调整图像尺寸
`BEiTMaskGenerator`	Generate mask for image.
`SimMIMMaskGenerator`	Generate random block mask for each Image.

组合式增强¶

组合式增强将一系列数据增强方法组合在一起，实现对样本的整体增强，例如 AutoAugment 和 RandAugment

`AutoAugment`	Auto augmentation.
`RandAugment`	Random augmentation.

The above transforms is composed from a group of policies from the below random transforms:

`AutoContrast`	自动调整图像对比度
`Brightness`	自动调整图像亮度
`ColorTransform`	自动调整图像平衡
`Contrast`	改变图像对比度
`Cutout`	擦除部分图像区域
`Equalize`	均衡化图像直方图
`GaussianBlur`	Gaussian blur images.
`Invert`	反转图像色阶
`Posterize`	图像像素化（降低各色彩通道的比特数）
`Rotate`	旋转图像
`Sharpness`	改变图像锐度
`Shear`	图像切变
`Solarize`	图像日光化（反转高于某一阈值的所有图像色阶）
`SolarizeAdd`	图像过曝（为低于某一阈值的所有色阶增加一个固定值）
`Translate`	平移图像
`BaseAugTransform`	用于组合式增强的数据变换基类

MMCV 中的数据变换 ¶

我们还在 MMCV 中提供了很多数据转换类。你可以在配置文件中直接使用它们。这里我们列举了一些常用的数据变换类，完整的数据变换类列表可以在 mmcv.transforms 中找到。

Transform Wrapper ¶

MultiView

A transform wrapper for multiple views of an image.

TorchVision Transforms ¶

We also provide all the transforms in TorchVision. You can use them the like following examples:

1. Use some TorchVision Augs Surrounded by NumpyToPIL and PILToNumpy (Recommendation)

Add TorchVision Augs surrounded by dict(type='NumpyToPIL', to_rgb=True), and dict(type='PILToNumpy', to_bgr=True),

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='NumpyToPIL', to_rgb=True),     # from BGR in cv2 to RGB  in PIL
    dict(type='torchvision/RandomResizedCrop',size=176),
    dict(type='PILToNumpy', to_bgr=True),     # from RGB  in PIL to BGR in cv2
    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
    dict(type='PackInputs'),
]

data_preprocessor = dict(
    num_classes=1000,
    mean=[123.675, 116.28, 103.53],
    std=[58.395, 57.12, 57.375],
    to_rgb=True,                          # from BGR in cv2 to RGB  in PIL
)

2. Use TorchVision Augs and ToTensor&Normalize

Make sure the ‘img’ has been converted to PIL format from BGR-Numpy format before being processed by TorchVision Augs.

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='NumpyToPIL', to_rgb=True),       # from BGR in cv2 to RGB  in PIL
    dict(
        type='torchvision/RandomResizedCrop',
        size=176,
        interpolation='bilinear'),            # accept str format interpolation mode
    dict(type='torchvision/RandomHorizontalFlip', p=0.5),
    dict(
        type='torchvision/TrivialAugmentWide',
        interpolation='bilinear'),
    dict(type='torchvision/PILToTensor'),
    dict(type='torchvision/ConvertImageDtype', dtype=torch.float),
    dict(
        type='torchvision/Normalize',
        mean=(0.485, 0.456, 0.406),
        std=(0.229, 0.224, 0.225),
    ),
    dict(type='torchvision/RandomErasing', p=0.1),
    dict(type='PackInputs'),
]

data_preprocessor = dict(num_classes=1000, mean=None, std=None, to_rgb=False)  # Normalize in dataset pipeline

3. Use TorchVision Augs Except ToTensor&Normalize

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='NumpyToPIL', to_rgb=True),   # from BGR in cv2 to RGB  in PIL
    dict(type='torchvision/RandomResizedCrop', size=176, interpolation='bilinear'),
    dict(type='torchvision/RandomHorizontalFlip', p=0.5),
    dict(type='torchvision/TrivialAugmentWide', interpolation='bilinear'),
    dict(type='PackInputs'),
]

# here the Normalize params is for the RGB format
data_preprocessor = dict(
    num_classes=1000,
    mean=[123.675, 116.28, 103.53],
    std=[58.395, 57.12, 57.375],
    to_rgb=False,
)

数据预处理器¶

数据预处理器也是在数据进入神经网络之前，对数据进行处理的组件。与数据变换相比，数据预处理器是模型的一个的模块，并且可以获得一个批次的数据进行处理，这意味着它可以使用模型所在的设备（如 GPU），并利用批量处理，实现加速。

The default data preprocessor in MMPreTrain could do the pre-processing like following:

将数据移动到模型所在的设备
将不同尺寸的输入填充至统一的尺寸
将一系列输入的 tensor 组成 batch
如果输入的 tensor 形状为 (3, H, W)，则可以执行 BGR 到 RGB 的通道转换
根据给定的均值和方差对图像进行归一化
在训练时进行批量数据增强，如 Mixup 和 CutMix

你可以在配置文件的 data_preprocessor 字段，或是 model.data_preprocessor 字段对数据预处理器进行配置。一个典型的用法如下：

data_preprocessor = dict(
    # RGB format normalization parameters
    mean=[123.675, 116.28, 103.53],
    std=[58.395, 57.12, 57.375],
    to_rgb=True,    # convert image from BGR to RGB
)

或者在 model.data_preprocessor 字段配置如下：

model = dict(
    backbone = ...,
    neck = ...,
    head = ...,
    data_preprocessor = dict(
                         mean=[123.675, 116.28, 103.53],
                         std=[58.395, 57.12, 57.375],
                         to_rgb=True)
    train_cfg=...,
)

请注意如果在两处均进行了配置，model.data_preprocessor 拥有更高的优先级。

`ClsDataPreprocessor`	用于分类任务的图像预处理器
`SelfSupDataPreprocessor`	Image pre-processor for operations, like normalization and bgr to rgb.
`TwoNormDataPreprocessor`	Image pre-processor for CAE, BEiT v1/v2, etc.
`VideoDataPreprocessor`	Video pre-processor for operations, like normalization and bgr to rgb conversion .

批量数据增强¶

批量数据增强是数据预处理器的一个功能。它可以利用一个批次的多个样本，以某种方式进行混合增强，如 Mixup 和 CutMix。

这些数据增强只会在训练过程中生效，因此，我们使用 model.train_cfg 字段来配置这些功能。

model = dict(
    backbone=...,
    neck=...,
    head=...,
    train_cfg=dict(augments=[
        dict(type='Mixup', alpha=0.8),
        dict(type='CutMix', alpha=1.0),
    ]),
)

你也可以通过 probs 字段指定每一个批量数据增强的概率。

model = dict(
    backbone=...,
    neck=...,
    head=...,
    train_cfg=dict(augments=[
        dict(type='Mixup', alpha=0.8),
        dict(type='CutMix', alpha=1.0),
    ], probs=[0.3, 0.7])
)

Here is a list of batch augmentations can be used in MMPreTrain.

`Mixup`	Mixup batch augmentation.
`CutMix`	CutMix batch agumentation.
`ResizeMix`	ResizeMix Random Paste layer for a batch of data.

数据处理¶

数据变换¶

Loading and Formatting¶

组合式增强¶

组合式增强¶

MMCV 中的数据变换¶

Transform Wrapper¶

TorchVision Transforms¶

数据预处理器¶

批量数据增强¶

Loading and Formatting ¶

组合式增强 ¶

MMCV 中的数据变换 ¶

Transform Wrapper ¶

TorchVision Transforms ¶