Shortcuts

数据处理

In MMPreTrain, the data process and the dataset is decomposed. The datasets only define how to get samples’ basic information from the file system. These basic information includes the ground-truth label and raw images data / the paths of images.The data process includes data transforms, data preprocessors and batch augmentations.

  • Data Transforms: Transforms includes loading, preprocessing, formatting and etc.

  • Data Preprocessors: Processes includes collate, normalization, stacking, channel fliping and etc.

数据变换

为了准备输入数据,我们需要对数据集中保存的基本信息做一些变换。这些变换包括数据加载、部分预处理和增强、格式化。一系列的数据变换组成了数据流水线(data pipeline)。因此,在数据集的配置参数中通常存在一个 pipeline 参数,例如:

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='RandomResizedCrop', scale=224),
    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
    dict(type='PackInputs'),
]

train_dataloader = dict(
    ....
    dataset=dict(
        pipeline=train_pipeline,
        ....),
    ....
)

pipeline 列表中的每一项都是以下数据变换类之一。如果您想添加自定义数据变换类,可以参考 自定义数据流水线教程

Loading and Formatting

LoadImageFromFile

从图片路径加载图片

PackInputs

Pack the inputs data.

PackMultiTaskInputs

Convert all image labels of multi-task dataset to a dict of tensor.

PILToNumpy

Convert img to numpy.ndarray.

NumpyToPIL

将图片从 OpenCV 格式转为为 PIL.Image.Image 格式

Transpose

转置 NumPy 数组

Collect

收集并仅保留指定字段的数据

组合式增强

Albumentations

使用 Albumentations 库进行数据变换的封装类

CenterCrop

Crop the center of the image, segmentation masks, bounding boxes and key points.

ColorJitter

随机改变图像的亮度、对比度和饱和度

EfficientNetCenterCrop

EfficientNet 风格的中心裁剪

EfficientNetRandomCrop

EfficientNet 风格的随机缩放裁剪

Lighting

使用 AlexNet 风格的 PCA 抖动随机调整图像照明

Normalize

归一化图像

RandomCrop

在随机位置裁剪给定图像

RandomErasing

在图像中随机选择一个矩形区域并擦除像素

RandomFlip

随机翻转图像、bbox、关键点等

RandomGrayscale

随机灰度化图像

RandomResize

随机缩放图像、bbox、关键点等

RandomResizedCrop

将给定图像按照随机尺寸和纵横比进行裁剪

Resize

缩放图像、bbox、分割图、关键点等

ResizeEdge

按照指定边长调整图像尺寸

BEiTMaskGenerator

Generate mask for image.

SimMIMMaskGenerator

Generate random block mask for each Image.

组合式增强

组合式增强将一系列数据增强方法组合在一起,实现对样本的整体增强,例如 AutoAugmentRandAugment

AutoAugment

Auto augmentation.

RandAugment

Random augmentation.

The above transforms is composed from a group of policies from the below random transforms:

AutoContrast

自动调整图像对比度

Brightness

自动调整图像亮度

ColorTransform

自动调整图像平衡

Contrast

改变图像对比度

Cutout

擦除部分图像区域

Equalize

均衡化图像直方图

GaussianBlur

Gaussian blur images.

Invert

反转图像色阶

Posterize

图像像素化(降低各色彩通道的比特数)

Rotate

旋转图像

Sharpness

改变图像锐度

Shear

图像切变

Solarize

图像日光化(反转高于某一阈值的所有图像色阶)

SolarizeAdd

图像过曝(为低于某一阈值的所有色阶增加一个固定值)

Translate

平移图像

BaseAugTransform

用于组合式增强的数据变换基类

MMCV 中的数据变换

我们还在 MMCV 中提供了很多数据转换类。你可以在配置文件中直接使用它们。这里我们列举了一些常用的数据变换类,完整的数据变换类列表可以在 mmcv.transforms 中找到。

Transform Wrapper

MultiView

A transform wrapper for multiple views of an image.

TorchVision Transforms

We also provide all the transforms in TorchVision. You can use them the like following examples:

1. Use some TorchVision Augs Surrounded by NumpyToPIL and PILToNumpy (Recommendation)

Add TorchVision Augs surrounded by dict(type='NumpyToPIL', to_rgb=True), and dict(type='PILToNumpy', to_bgr=True),

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='NumpyToPIL', to_rgb=True),     # from BGR in cv2 to RGB  in PIL
    dict(type='torchvision/RandomResizedCrop',size=176),
    dict(type='PILToNumpy', to_bgr=True),     # from RGB  in PIL to BGR in cv2
    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
    dict(type='PackInputs'),
]

data_preprocessor = dict(
    num_classes=1000,
    mean=[123.675, 116.28, 103.53],
    std=[58.395, 57.12, 57.375],
    to_rgb=True,                          # from BGR in cv2 to RGB  in PIL
)

2. Use TorchVision Augs and ToTensor&Normalize

Make sure the ‘img’ has been converted to PIL format from BGR-Numpy format before being processed by TorchVision Augs.

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='NumpyToPIL', to_rgb=True),       # from BGR in cv2 to RGB  in PIL
    dict(
        type='torchvision/RandomResizedCrop',
        size=176,
        interpolation='bilinear'),            # accept str format interpolation mode
    dict(type='torchvision/RandomHorizontalFlip', p=0.5),
    dict(
        type='torchvision/TrivialAugmentWide',
        interpolation='bilinear'),
    dict(type='torchvision/PILToTensor'),
    dict(type='torchvision/ConvertImageDtype', dtype=torch.float),
    dict(
        type='torchvision/Normalize',
        mean=(0.485, 0.456, 0.406),
        std=(0.229, 0.224, 0.225),
    ),
    dict(type='torchvision/RandomErasing', p=0.1),
    dict(type='PackInputs'),
]

data_preprocessor = dict(num_classes=1000, mean=None, std=None, to_rgb=False)  # Normalize in dataset pipeline

3. Use TorchVision Augs Except ToTensor&Normalize

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='NumpyToPIL', to_rgb=True),   # from BGR in cv2 to RGB  in PIL
    dict(type='torchvision/RandomResizedCrop', size=176, interpolation='bilinear'),
    dict(type='torchvision/RandomHorizontalFlip', p=0.5),
    dict(type='torchvision/TrivialAugmentWide', interpolation='bilinear'),
    dict(type='PackInputs'),
]

# here the Normalize params is for the RGB format
data_preprocessor = dict(
    num_classes=1000,
    mean=[123.675, 116.28, 103.53],
    std=[58.395, 57.12, 57.375],
    to_rgb=False,
)

数据预处理器

数据预处理器也是在数据进入神经网络之前,对数据进行处理的组件。与数据变换相比,数据预处理器是模型的一个的模块,并且可以获得一个批次的数据进行处理,这意味着它可以使用模型所在的设备(如 GPU),并利用批量处理,实现加速。

The default data preprocessor in MMPreTrain could do the pre-processing like following:

  1. 将数据移动到模型所在的设备

  2. 将不同尺寸的输入填充至统一的尺寸

  3. 将一系列输入的 tensor 组成 batch

  4. 如果输入的 tensor 形状为 (3, H, W),则可以执行 BGR 到 RGB 的通道转换

  5. 根据给定的均值和方差对图像进行归一化

  6. 在训练时进行批量数据增强,如 Mixup 和 CutMix

你可以在配置文件的 data_preprocessor 字段,或是 model.data_preprocessor 字段对数据预处理器进行配置。一个典型的用法如下:

data_preprocessor = dict(
    # RGB format normalization parameters
    mean=[123.675, 116.28, 103.53],
    std=[58.395, 57.12, 57.375],
    to_rgb=True,    # convert image from BGR to RGB
)

或者在 model.data_preprocessor 字段配置如下:

model = dict(
    backbone = ...,
    neck = ...,
    head = ...,
    data_preprocessor = dict(
                         mean=[123.675, 116.28, 103.53],
                         std=[58.395, 57.12, 57.375],
                         to_rgb=True)
    train_cfg=...,
)

请注意如果在两处均进行了配置,model.data_preprocessor 拥有更高的优先级。

ClsDataPreprocessor

用于分类任务的图像预处理器

SelfSupDataPreprocessor

Image pre-processor for operations, like normalization and bgr to rgb.

TwoNormDataPreprocessor

Image pre-processor for CAE, BEiT v1/v2, etc.

VideoDataPreprocessor

Video pre-processor for operations, like normalization and bgr to rgb conversion .

批量数据增强

批量数据增强是数据预处理器的一个功能。它可以利用一个批次的多个样本,以某种方式进行混合增强,如 Mixup 和 CutMix。

这些数据增强只会在训练过程中生效,因此,我们使用 model.train_cfg 字段来配置这些功能。

model = dict(
    backbone=...,
    neck=...,
    head=...,
    train_cfg=dict(augments=[
        dict(type='Mixup', alpha=0.8),
        dict(type='CutMix', alpha=1.0),
    ]),
)

你也可以通过 probs 字段指定每一个批量数据增强的概率。

model = dict(
    backbone=...,
    neck=...,
    head=...,
    train_cfg=dict(augments=[
        dict(type='Mixup', alpha=0.8),
        dict(type='CutMix', alpha=1.0),
    ], probs=[0.3, 0.7])
)

Here is a list of batch augmentations can be used in MMPreTrain.

Mixup

Mixup batch augmentation.

CutMix

CutMix batch agumentation.

ResizeMix

ResizeMix Random Paste layer for a batch of data.

Read the Docs v: latest
Versions
latest
stable
mmcls-1.x
mmcls-0.x
dev
Downloads
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.