MixUp and Friends

Callbacks that can apply the MixUp (and variants) data augmentation to your training
from fastai.vision.all import *

source

reduce_loss

 reduce_loss (loss:torch.Tensor, reduction:str='mean')

Reduce the loss based on reduction

Type Default Details
loss Tensor
reduction str mean PyTorch loss reduction
Returns Tensor

source

MixHandler

 MixHandler (alpha:float=0.5)

A handler class for implementing MixUp style scheduling

Type Default Details
alpha float 0.5 Determine Beta distribution in range (0.,inf]

Most Mix variants will perform the data augmentation on the batch, so to implement your Mix you should adjust the before_batch event with however your training regiment requires. Also if a different loss function is needed, you should adjust the lf as well. alpha is passed to Beta to create a sampler.


source

MixUp

 MixUp (alpha:float=0.4)

Implementation of https://arxiv.org/abs/1710.09412

Type Default Details
alpha float 0.4 Determine Beta distribution in range (0.,inf]

This is a modified implementation of mixup that will always blend at least 50% of the original image. The original paper calls for a Beta distribution which is passed the same value of alpha for each position in the loss function (alpha = beta = #). Unlike the original paper, this implementation of mixup selects the max of lambda which means that if the value that is sampled as lambda is less than 0.5 (i.e the original image would be <50% represented, 1-lambda is used instead.

The blending of two images is determined by alpha.

\(alpha=1.\):

  • All values between 0 and 1 have an equal chance of being sampled.
  • Any amount of mixing between the two images is possible

\(alpha<1.\):

  • The values closer to 0 and 1 become more likely to be sampled than the values near 0.5.
  • It is more likely that one of the images will be selected with a slight amount of the other image.

\(alpha>1.\):

  • The values closer to 0.5 become more likely than the numbers close to 0 or 1.
  • It is more likely that the images will be blended evenly.

First we’ll look at a very minimalistic example to show how our data is being generated with the PETS dataset:

path = untar_data(URLs.PETS)
pat        = r'([^/]+)_\d+.*$'
fnames     = get_image_files(path/'images')
item_tfms  = [Resize(256, method='crop')]
batch_tfms = [*aug_transforms(size=224), Normalize.from_stats(*imagenet_stats)]
dls = ImageDataLoaders.from_name_re(path, fnames, pat, bs=64, item_tfms=item_tfms, 
                                    batch_tfms=batch_tfms)

We can examine the results of our Callback by grabbing our data during fit at before_batch like so:

mixup = MixUp(1.)
with Learner(dls, nn.Linear(3,4), loss_func=CrossEntropyLossFlat(), cbs=mixup) as learn:
    learn.epoch,learn.training = 0,True
    learn.dl = dls.train
    b = dls.one_batch()
    learn._split(b)
    learn('before_train')
    learn('before_batch')

_,axs = plt.subplots(3,3, figsize=(9,9))
dls.show_batch(b=(mixup.x,mixup.y), ctxs=axs.flatten())
epoch train_loss valid_loss time
0 00:00

We can see that every so often an image gets “mixed” with another.

How do we train? You can pass the Callback either to Learner directly or to cbs in your fit function:

learn = vision_learner(dls, resnet18, loss_func=CrossEntropyLossFlat(), metrics=[error_rate])
learn.fit_one_cycle(1, cbs=mixup)
epoch train_loss valid_loss error_rate time
0 2.041960 0.495492 0.162382 00:12

source

CutMix

 CutMix (alpha:float=1.0)

Implementation of https://arxiv.org/abs/1905.04899

Type Default Details
alpha float 1.0 Determine Beta distribution in range (0.,inf]

Similar to MixUp, CutMix will cut a random box out of two images and swap them together. We can look at a few examples below:

cutmix = CutMix(1.)
with Learner(dls, nn.Linear(3,4), loss_func=CrossEntropyLossFlat(), cbs=cutmix) as learn:
    learn.epoch,learn.training = 0,True
    learn.dl = dls.train
    b = dls.one_batch()
    learn._split(b)
    learn('before_train')
    learn('before_batch')

_,axs = plt.subplots(3,3, figsize=(9,9))
dls.show_batch(b=(cutmix.x,cutmix.y), ctxs=axs.flatten())
epoch train_loss valid_loss time
0 00:00

We train with it in the exact same way as well

learn = vision_learner(dls, resnet18, loss_func=CrossEntropyLossFlat(), metrics=[accuracy, error_rate])
learn.fit_one_cycle(1, cbs=cutmix)
epoch train_loss valid_loss accuracy error_rate time
0 3.440883 0.793059 0.769959 0.230041 00:12