mmpretrain.datasets¶
dataset
包中包含了分类任务中常用的数据集,以及一些数据集封装。
Custom Dataset¶
- class mmpretrain.datasets.CustomDataset(data_root='', data_prefix='', ann_file='', with_label=True, extensions=('.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif'), metainfo=None, lazy_init=False, **kwargs)[源代码]¶
A generic dataset for multiple tasks.
The dataset supports two kinds of style.
Use an annotation file to specify all samples, and each line indicates a sample:
The annotation file (for
with_label=True
, supervised tasks.):folder_1/xxx.png 0 folder_1/xxy.png 1 123.png 4 nsdf3.png 3 ...
The annotation file (for
with_label=False
, unsupervised tasks.):folder_1/xxx.png folder_1/xxy.png 123.png nsdf3.png ...
Sample files:
data_prefix/ ├── folder_1 │ ├── xxx.png │ ├── xxy.png │ └── ... ├── 123.png ├── nsdf3.png └── ...
Please use the argument
metainfo
to specify extra information for the task, like{'classes': ('bird', 'cat', 'deer', 'dog', 'frog')}
.Place all samples in one folder as below:
Sample files (for
with_label=True
, supervised tasks, we use the name of sub-folders as the categories names):data_prefix/ ├── class_x │ ├── xxx.png │ ├── xxy.png │ └── ... │ └── xxz.png └── class_y ├── 123.png ├── nsdf3.png ├── ... └── asd932_.png
Sample files (for
with_label=False
, unsupervised tasks, we use all sample files under the specified folder):data_prefix/ ├── folder_1 │ ├── xxx.png │ ├── xxy.png │ └── ... ├── 123.png ├── nsdf3.png └── ...
If the
ann_file
is specified, the dataset will be generated by the first way, otherwise, try the second way.- 参数:
data_root (str) – The root directory for
data_prefix
andann_file
. Defaults to ‘’.data_prefix (str | dict) – Prefix for the data. Defaults to ‘’.
ann_file (str) – Annotation file path. Defaults to ‘’.
with_label (bool) – Whether the annotation file includes ground truth labels, or use sub-folders to specify categories. Defaults to True.
extensions (Sequence[str]) – A sequence of allowed extensions. Defaults to (‘.jpg’, ‘.jpeg’, ‘.png’, ‘.ppm’, ‘.bmp’, ‘.pgm’, ‘.tif’).
metainfo (dict, optional) – Meta information for dataset, such as class information. Defaults to None.
lazy_init (bool) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Defaults to False.**kwargs – Other keyword arguments in
BaseDataset
.
ImageNet¶
- class mmpretrain.datasets.ImageNet(data_root='', data_prefix='', ann_file='', metainfo=None, **kwargs)[源代码]¶
ImageNet Dataset.
The dataset supports two kinds of annotation format. More details can be found in
CustomDataset
.- 参数:
data_root (str) – The root directory for
data_prefix
andann_file
. Defaults to ‘’.data_prefix (str | dict) – Prefix for training data. Defaults to ‘’.
ann_file (str) – Annotation file path. Defaults to ‘’.
metainfo (dict, optional) – Meta information for dataset, such as class information. Defaults to None.
**kwargs – Other keyword arguments in
CustomDataset
andBaseDataset
.
- class mmpretrain.datasets.ImageNet21k(data_root='', data_prefix='', ann_file='', metainfo=None, multi_label=False, **kwargs)[源代码]¶
ImageNet21k Dataset.
Since the dataset ImageNet21k is extremely big, cantains 21k+ classes and 1.4B files. We won’t provide the default categories list. Please specify it from the
classes
argument.- 参数:
data_root (str) – The root directory for
data_prefix
andann_file
. Defaults to ‘’.data_prefix (str | dict) – Prefix for training data. Defaults to ‘’.
ann_file (str) – Annotation file path. Defaults to ‘’.
metainfo (dict, optional) – Meta information for dataset, such as class information. Defaults to None.
multi_label (bool) – Not implement by now. Use multi label or not. Defaults to False.
**kwargs – Other keyword arguments in
CustomDataset
andBaseDataset
.
CIFAR¶
- class mmpretrain.datasets.CIFAR10(data_root='', split='train', metainfo=None, download=True, data_prefix='', test_mode=False, **kwargs)[源代码]¶
CIFAR10 Dataset.
This implementation is modified from https://github.com/pytorch/vision/blob/master/torchvision/datasets/cifar.py
- 参数:
data_root (str) – The root directory of the CIFAR Dataset.
split (str, optional) – The dataset split, supports “train” and “test”. Default to “train”.
metainfo (dict, optional) – Meta information for dataset, such as categories information. Defaults to None.
download (bool) – Whether to download the dataset if not exists. Defaults to True.
**kwargs – Other keyword arguments in
BaseDataset
.
- class mmpretrain.datasets.CIFAR100(data_root='', split='train', metainfo=None, download=True, data_prefix='', test_mode=False, **kwargs)[源代码]¶
CIFAR100 Dataset.
- 参数:
data_root (str) – The root directory of the CIFAR Dataset.
split (str, optional) – The dataset split, supports “train” and “test”. Default to “train”.
metainfo (dict, optional) – Meta information for dataset, such as categories information. Defaults to None.
download (bool) – Whether to download the dataset if not exists. Defaults to True.
**kwargs – Other keyword arguments in
BaseDataset
.
MNIST¶
- class mmpretrain.datasets.MNIST(data_prefix, test_mode, metainfo=None, data_root='', download=True, **kwargs)[源代码]¶
MNIST Dataset.
This implementation is modified from https://github.com/pytorch/vision/blob/master/torchvision/datasets/mnist.py
- 参数:
data_prefix (str) – Prefix for data.
test_mode (bool) –
test_mode=True
means in test phase. It determines to use the training set or test set.metainfo (dict, optional) – Meta information for dataset, such as categories information. Defaults to None.
data_root (str) – The root directory for
data_prefix
. Defaults to ‘’.download (bool) – Whether to download the dataset if not exists. Defaults to True.
**kwargs – Other keyword arguments in
BaseDataset
.
- class mmpretrain.datasets.FashionMNIST(data_prefix, test_mode, metainfo=None, data_root='', download=True, **kwargs)[源代码]¶
Fashion-MNIST Dataset.
- 参数:
data_prefix (str) – Prefix for data.
test_mode (bool) –
test_mode=True
means in test phase. It determines to use the training set or test set.metainfo (dict, optional) – Meta information for dataset, such as categories information. Defaults to None.
data_root (str) – The root directory for
data_prefix
. Defaults to ‘’.download (bool) – Whether to download the dataset if not exists. Defaults to True.
**kwargs – Other keyword arguments in
BaseDataset
.
VOC¶
- class mmpretrain.datasets.VOC(data_root, image_set_path, data_prefix={'ann_path': 'Annotations', 'img_path': 'JPEGImages'}, test_mode=False, metainfo=None, **kwargs)[源代码]¶
Pascal VOC Dataset.
After decompression, the dataset directory structure is as follows:
VOC dataset directory:
VOC2007 (data_root)/ ├── JPEGImages (data_prefix['img_path']) │ ├── xxx.jpg │ ├── xxy.jpg │ └── ... ├── Annotations (data_prefix['ann_path']) │ ├── xxx.xml │ ├── xxy.xml │ └── ... └── ImageSets (directory contains various imageset file)
Extra difficult label is in VOC annotations, we will use gt_label_difficult to record the difficult labels in each sample and corresponding evaluation should take care of this field to calculate metrics. Usually, difficult labels are reckoned as negative in defaults.
- 参数:
data_root (str) – The root directory for VOC dataset.
image_set_path (str) – The path of image set, The file which lists image ids of the sub dataset, and this path is relative to
data_root
.data_prefix (dict) – Prefix for data and annotation, keyword ‘img_path’ and ‘ann_path’ can be set. Defaults to be
dict(img_path='JPEGImages', ann_path='Annotations')
.test_mode (bool) –
test_mode=True
means in test phase. It determines to use the training set or test set.metainfo (dict, optional) – Meta information for dataset, such as categories information. Defaults to None.
**kwargs – Other keyword arguments in
BaseDataset
.
CUB¶
- class mmpretrain.datasets.CUB(data_root, split='train', test_mode=False, **kwargs)[源代码]¶
The CUB-200-2011 Dataset.
Support the CUB-200-2011 Dataset. Comparing with the CUB-200 Dataset, there are much more pictures in CUB-200-2011. After downloading and decompression, the dataset directory structure is as follows.
CUB dataset directory:
CUB_200_2011 ├── images │ ├── class_x │ │ ├── xx1.jpg │ │ ├── xx2.jpg │ │ └── ... │ ├── class_y │ │ ├── yy1.jpg │ │ ├── yy2.jpg │ │ └── ... │ └── ... ├── images.txt ├── image_class_labels.txt ├── train_test_split.txt └── ....
- 参数:
示例
>>> from mmpretrain.datasets import CUB >>> train_dataset = CUB(data_root='data/CUB_200_2011', split='train') >>> train_dataset Dataset CUB Number of samples: 5994 Number of categories: 200 Root of dataset: data/CUB_200_2011 >>> test_dataset = CUB(data_root='data/CUB_200_2011', split='test') >>> test_dataset Dataset CUB Number of samples: 5794 Number of categories: 200 Root of dataset: data/CUB_200_2011
Places205¶
- class mmpretrain.datasets.Places205(data_root='', data_prefix='', ann_file='', metainfo=None, **kwargs)[源代码]¶
Places205 Dataset.
- 参数:
data_root (str) – The root directory for
data_prefix
andann_file
. Defaults to ‘’.data_prefix (str | dict) – Prefix for training data. Defaults to ‘’.
ann_file (str) – Annotation file path. Defaults to ‘’.
metainfo (dict, optional) – Meta information for dataset, such as class information. Defaults to None.
**kwargs – Other keyword arguments in
CustomDataset
andBaseDataset
.
Retrieval¶
- class mmpretrain.datasets.InShop(data_root, split='train', data_prefix='Img', ann_file='Eval/list_eval_partition.txt', **kwargs)[源代码]¶
InShop Dataset for Image Retrieval.
Please download the images from the homepage ‘https://mmlab.ie.cuhk.edu.hk/projects/DeepFashion/InShopRetrieval.html’ (In-shop Clothes Retrieval Benchmark -> Img -> img.zip, Eval/list_eval_partition.txt), and organize them as follows way:
In-shop Clothes Retrieval Benchmark (data_root)/ ├── Eval / │ └── list_eval_partition.txt (ann_file) ├── Img (img_prefix) │ └── img/ ├── README.txt └── .....
- 参数:
data_root (str) – The root directory for dataset.
split (str) – Choose from ‘train’, ‘query’ and ‘gallery’. Defaults to ‘train’.
data_prefix (str | dict) – Prefix for training data. Defaults to ‘Img’.
ann_file (str) – Annotation file path, path relative to
data_root
. Defaults to ‘Eval/list_eval_partition.txt’.**kwargs – Other keyword arguments in
BaseDataset
.
示例
>>> from mmpretrain.datasets import InShop >>> >>> # build train InShop dataset >>> inshop_train_cfg = dict(data_root='data/inshop', split='train') >>> inshop_train = InShop(**inshop_train_cfg) >>> inshop_train Dataset InShop Number of samples: 25882 The `CLASSES` meta info is not set. Root of dataset: data/inshop >>> >>> # build query InShop dataset >>> inshop_query_cfg = dict(data_root='data/inshop', split='query') >>> inshop_query = InShop(**inshop_query_cfg) >>> inshop_query Dataset InShop Number of samples: 14218 The `CLASSES` meta info is not set. Root of dataset: data/inshop >>> >>> # build gallery InShop dataset >>> inshop_gallery_cfg = dict(data_root='data/inshop', split='gallery') >>> inshop_gallery = InShop(**inshop_gallery_cfg) >>> inshop_gallery Dataset InShop Number of samples: 12612 The `CLASSES` meta info is not set. Root of dataset: data/inshop
Base classes¶
- class mmpretrain.datasets.BaseDataset(ann_file, metainfo=None, data_root='', data_prefix='', filter_cfg=None, indices=None, serialize_data=True, pipeline=(), test_mode=False, lazy_init=False, max_refetch=1000, classes=None)[源代码]¶
Base dataset for image classification task.
This dataset support annotation file in OpenMMLab 2.0 style annotation format.
Comparing with the
mmengine.BaseDataset
, this class implemented several useful methods.- 参数:
ann_file (str) – Annotation file path.
metainfo (dict, optional) – Meta information for dataset, such as class information. Defaults to None.
data_root (str) – The root directory for
data_prefix
andann_file
. Defaults to ‘’.data_prefix (str | dict) – Prefix for training data. Defaults to ‘’.
filter_cfg (dict, optional) – Config for filter data. Defaults to None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Defaults to None, which means using all
data_infos
.serialize_data (bool) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Defaults to True.
pipeline (Sequence) – Processing pipeline. Defaults to an empty tuple.
test_mode (bool, optional) –
test_mode=True
means in test phase, an error will be raised when getting an item fails,test_mode=False
means in training phase, another item will be returned randomly. Defaults to False.lazy_init (bool) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Defaults to False.max_refetch (int) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Defaults to 1000.classes (str | Sequence[str], optional) –
Specify names of classes.
If is string, it should be a file path, and the every line of the file is a name of a class.
If is a sequence of string, every item is a name of class.
If is None, use categories information in
metainfo
argument, annotation file or the class attributeMETAINFO
.
Defaults to None.
- class mmpretrain.datasets.MultiLabelDataset(ann_file, metainfo=None, data_root='', data_prefix='', filter_cfg=None, indices=None, serialize_data=True, pipeline=(), test_mode=False, lazy_init=False, max_refetch=1000, classes=None)[源代码]¶
Multi-label Dataset.
This dataset support annotation file in OpenMMLab 2.0 style annotation format.
The annotation format is shown as follows.
{ "metainfo": { "classes":['A', 'B', 'C'....] }, "data_list": [ { "img_path": "test_img1.jpg", 'gt_label': [0, 1], }, { "img_path": "test_img2.jpg", 'gt_label': [2], }, ] .... }
- 参数:
ann_file (str) – Annotation file path.
metainfo (dict, optional) – Meta information for dataset, such as class information. Defaults to None.
data_root (str) – The root directory for
data_prefix
andann_file
. Defaults to ‘’.data_prefix (str | dict) – Prefix for training data. Defaults to ‘’.
filter_cfg (dict, optional) – Config for filter data. Defaults to None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Defaults to None which means using all
data_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Defaults to True.
pipeline (list, optional) – Processing pipeline. Defaults to [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Defaults to False.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Defaults to False.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Defaults to 1000.classes (str | Sequence[str], optional) –
Specify names of classes.
If is string, it should be a file path, and the every line of the file is a name of a class.
If is a sequence of string, every item is a name of class.
If is None, use categories information in
metainfo
argument, annotation file or the class attributeMETAINFO
.
Defaults to None.
Caltech101¶
- class mmpretrain.datasets.Caltech101(data_root, split='train', **kwargs)[源代码]¶
The Caltech101 Dataset.
Support the Caltech101 Dataset. After downloading and decompression, the dataset directory structure is as follows.
Caltech101 dataset directory:
caltech-101 ├── 101_ObjectCategories │ ├── class_x │ │ ├── xx1.jpg │ │ ├── xx2.jpg │ │ └── ... │ ├── class_y │ │ ├── yy1.jpg │ │ ├── yy2.jpg │ │ └── ... │ └── ... ├── Annotations │ ├── class_x │ │ ├── xx1.mat │ │ └── ... │ └── ... ├── meta │ ├── train.txt │ └── test.txt └── ....
Please note that since there is no official splitting for training and test set, you can use the train.txt and text.txt provided by us or create your own annotation files. Here is the download link for the annotations.
- 参数:
示例
>>> from mmpretrain.datasets import Caltech101 >>> train_dataset = Caltech101(data_root='data/caltech-101', split='train') >>> train_dataset Dataset Caltech101 Number of samples: 3060 Number of categories: 102 Root of dataset: data/caltech-101 >>> test_dataset = Caltech101(data_root='data/caltech-101', split='test') >>> test_dataset Dataset Caltech101 Number of samples: 6728 Number of categories: 102 Root of dataset: data/caltech-101
Food101¶
- class mmpretrain.datasets.Food101(data_root, split='train', **kwargs)[源代码]¶
The Food101 Dataset.
Support the Food101 Dataset Dataset. After downloading and decompression, the dataset directory structure is as follows.
Food101 dataset directory:
food-101 ├── images │ ├── class_x │ │ ├── xx1.jpg │ │ ├── xx2.jpg │ │ └── ... │ ├── class_y │ │ ├── yy1.jpg │ │ ├── yy2.jpg │ │ └── ... │ └── ... ├── meta │ ├── train.txt │ └── test.txt └── ....
- 参数:
示例
>>> from mmpretrain.datasets import Food101 >>> train_dataset = Food101(data_root='data/food-101', split='train') >>> train_dataset Dataset Food101 Number of samples: 75750 Number of categories: 101 Root of dataset: data/food-101 >>> test_dataset = Food101(data_root='data/food-101', split='test') >>> test_dataset Dataset Food101 Number of samples: 25250 Number of categories: 101 Root of dataset: data/food-101
DTD¶
- class mmpretrain.datasets.DTD(data_root, split='trainval', **kwargs)[源代码]¶
The Describable Texture Dataset (DTD).
Support the Describable Texture Dataset Dataset. After downloading and decompression, the dataset directory structure is as follows.
DTD dataset directory:
dtd ├── images │ ├── banded | | ├──banded_0002.jpg | | ├──banded_0004.jpg | | └── ... │ └── ... ├── imdb │ └── imdb.mat ├── labels | | ├──labels_joint_anno.txt | | ├──test1.txt | | ├──test2.txt | | └── ... │ └── ... └── ....
- 参数:
示例
>>> from mmpretrain.datasets import DTD >>> train_dataset = DTD(data_root='data/dtd', split='trainval') >>> train_dataset Dataset DTD Number of samples: 3760 Number of categories: 47 Root of dataset: data/dtd >>> test_dataset = DTD(data_root='data/dtd', split='test') >>> test_dataset Dataset DTD Number of samples: 1880 Number of categories: 47 Root of dataset: data/dtd
FGVCAircraft¶
- class mmpretrain.datasets.FGVCAircraft(data_root, split='trainval', **kwargs)[源代码]¶
The FGVC_Aircraft Dataset.
Support the FGVC_Aircraft Dataset Dataset. After downloading and decompression, the dataset directory structure is as follows.
FGVC_Aircraft dataset directory:
fgvc-aircraft-2013b └── data ├── images │ ├── 1.jpg │ ├── 2.jpg │ └── ... ├── images_variant_train.txt ├── images_variant_test.txt ├── images_variant_trainval.txt ├── images_variant_val.txt ├── variants.txt └── ....
- 参数:
示例
>>> from mmpretrain.datasets import FGVCAircraft >>> train_dataset = FGVCAircraft(data_root='data/fgvc-aircraft-2013b', split='trainval') >>> train_dataset Dataset FGVCAircraft Number of samples: 6667 Number of categories: 100 Root of dataset: data/fgvc-aircraft-2013b >>> test_dataset = FGVCAircraft(data_root='data/fgvc-aircraft-2013b', split='test') >>> test_dataset Dataset FGVCAircraft Number of samples: 3333 Number of categories: 100 Root of dataset: data/fgvc-aircraft-2013b
Flowers102¶
- class mmpretrain.datasets.Flowers102(data_root, split='trainval', **kwargs)[源代码]¶
The Oxford 102 Flower Dataset.
Support the Oxford 102 Flowers Dataset Dataset. After downloading and decompression, the dataset directory structure is as follows.
Flowers102 dataset directory:
Flowers102 ├── jpg │ ├── image_00001.jpg │ ├── image_00002.jpg │ └── ... ├── imagelabels.mat ├── setid.mat └── ...
- 参数:
示例
>>> from mmpretrain.datasets import Flowers102 >>> train_dataset = Flowers102(data_root='data/Flowers102', split='trainval') >>> train_dataset Dataset Flowers102 Number of samples: 2040 Root of dataset: data/Flowers102 >>> test_dataset = Flowers102(data_root='data/Flowers102', split='test') >>> test_dataset Dataset Flowers102 Number of samples: 6149 Root of dataset: data/Flowers102
StanfordCars¶
- class mmpretrain.datasets.StanfordCars(data_root, split='train', **kwargs)[源代码]¶
The Stanford Cars Dataset.
Support the Stanford Cars Dataset Dataset. The official website provides two ways to organize the dataset. Therefore, after downloading and decompression, the dataset directory structure is as follows.
Stanford Cars dataset directory:
Stanford_Cars ├── car_ims │ ├── 00001.jpg │ ├── 00002.jpg │ └── ... └── cars_annos.mat
or
Stanford_Cars ├── cars_train │ ├── 00001.jpg │ ├── 00002.jpg │ └── ... ├── cars_test │ ├── 00001.jpg │ ├── 00002.jpg │ └── ... └── devkit ├── cars_meta.mat ├── cars_train_annos.mat ├── cars_test_annos.mat ├── cars_test_annoswithlabels.mat ├── eval_train.m └── train_perfect_preds.txt
- 参数:
示例
>>> from mmpretrain.datasets import StanfordCars >>> train_dataset = StanfordCars(data_root='data/Stanford_Cars', split='train') >>> train_dataset Dataset StanfordCars Number of samples: 8144 Number of categories: 196 Root of dataset: data/Stanford_Cars >>> test_dataset = StanfordCars(data_root='data/Stanford_Cars', split='test') >>> test_dataset Dataset StanfordCars Number of samples: 8041 Number of categories: 196 Root of dataset: data/Stanford_Cars
OxfordIIITPet¶
- class mmpretrain.datasets.OxfordIIITPet(data_root, split='trainval', **kwargs)[源代码]¶
The Oxford-IIIT Pets Dataset.
Support the Oxford-IIIT Pets Dataset Dataset. After downloading and decompression, the dataset directory structure is as follows.
Oxford-IIIT_Pets dataset directory:
Oxford-IIIT_Pets ├── images │ ├── Abyssinian_1.jpg │ ├── Abyssinian_2.jpg │ └── ... ├── annotations │ ├── trainval.txt │ ├── test.txt │ ├── list.txt │ └── ... └── ....
- 参数:
示例
>>> from mmpretrain.datasets import OxfordIIITPet >>> train_dataset = OxfordIIITPet(data_root='data/Oxford-IIIT_Pets', split='trainval') >>> train_dataset Dataset OxfordIIITPet Number of samples: 3680 Number of categories: 37 Root of dataset: data/Oxford-IIIT_Pets >>> test_dataset = OxfordIIITPet(data_root='data/Oxford-IIIT_Pets', split='test') >>> test_dataset Dataset OxfordIIITPet Number of samples: 3669 Number of categories: 37 Root of dataset: data/Oxford-IIIT_Pets
SUN397¶
- class mmpretrain.datasets.SUN397(data_root, split='train', **kwargs)[源代码]¶
The SUN397 Dataset.
Support the SUN397 Dataset Dataset. After downloading and decompression, the dataset directory structure is as follows.
SUN397 dataset directory:
SUN397 ├── SUN397 │ ├── a │ │ ├── abbey │ | | ├── sun_aaalbzqrimafwbiv.jpg │ | | └── ... │ │ ├── airplane_cabin │ | | ├── sun_aadqdkqaslqqoblu.jpg │ | | └── ... │ | └── ... │ ├── b │ │ └── ... │ ├── c │ │ └── ... │ └── ... └── Partitions ├── ClassName.txt ├── Training_01.txt ├── Testing_01.txt └── ...
- 参数:
示例
>>> from mmpretrain.datasets import SUN397 >>> train_dataset = SUN397(data_root='data/SUN397', split='train') >>> train_dataset Dataset SUN397 Number of samples: 19824 Number of categories: 397 Root of dataset: data/SUN397 >>> test_dataset = SUN397(data_root='data/SUN397', split='test') >>> test_dataset Dataset SUN397 Number of samples: 19829 Number of categories: 397 Root of dataset: data/SUN397
Dataset Wrappers¶
- class mmpretrain.datasets.KFoldDataset(dataset, fold=0, num_splits=5, test_mode=False, seed=None)[源代码]¶
A wrapper of dataset for K-Fold cross-validation.
K-Fold cross-validation divides all the samples in groups of samples, called folds, of almost equal sizes. And we use k-1 of folds to do training and use the fold left to do validation.
- 参数:
dataset (
mmengine.dataset.BaseDataset
| dict) – The dataset to be dividedfold (int) – The fold used to do validation. Defaults to 0.
num_splits (int) – The number of all folds. Defaults to 5.
test_mode (bool) – Use the training dataset or validation dataset. Defaults to False.
seed (int, optional) – The seed to shuffle the dataset before splitting. If None, not shuffle the dataset. Defaults to None.
The dataset wrappers in the MMEngine can be directly used in MMPreTrain.
A wrapper of concatenated dataset. |
|
A wrapper of repeated dataset. |
|
A wrapper of class balanced dataset. |