Skip to content

Latest commit

 

History

History
69 lines (57 loc) · 2.15 KB

2_data_pipeline.md

File metadata and controls

69 lines (57 loc) · 2.15 KB

Tutorial 2: Customize Data Pipelines

Overview of Pipeline

DataSource and Pipeline are two important components in Dataset. We have introduced DataSource in add_new_dataset. And the Pipeline is responsible for applying a series of data augmentations to images, such as random flip.

Here is a config example of Pipeline for SimCLR training:

train_pipeline = [
    dict(type='RandomResizedCrop', size=224),
    dict(type='RandomHorizontalFlip'),
    dict(
        type='RandomAppliedTrans',
        transforms=[
            dict(
                type='ColorJitter',
                brightness=0.8,
                contrast=0.8,
                saturation=0.8,
                hue=0.2)
        ],
        p=0.8),
    dict(type='RandomGrayscale', p=0.2),
    dict(type='GaussianBlur', sigma_min=0.1, sigma_max=2.0, p=0.5)
]

Every augmentation in the Pipeline receives an image as input and outputs an augmented image.

Creating new augmentations in Pipeline

1.Write a new transformation function in transforms.py and overwrite the __call__ function, which takes a Pillow image as input:

@PIPELINES.register_module()
class MyTransform(object):

    def __call__(self, img):
        # apply transforms on img
        return img

2.Use it in config files. We reuse the config file shown above and add MyTransform to it.

train_pipeline = [
    dict(type='RandomResizedCrop', size=224),
    dict(type='RandomHorizontalFlip'),
    dict(type='MyTransform'),
    dict(
        type='RandomAppliedTrans',
        transforms=[
            dict(
                type='ColorJitter',
                brightness=0.8,
                contrast=0.8,
                saturation=0.8,
                hue=0.2)
        ],
        p=0.8),
    dict(type='RandomGrayscale', p=0.2),
    dict(type='GaussianBlur', sigma_min=0.1, sigma_max=2.0, p=0.5)
]