learn = synth_learner()
learn.fit(1, cbs=ShortEpochCallback())
learn = synth_learner()
learn.fit(1, cbs=ShortEpochCallback(short_valid=False))
When the number of steps per accumulation is higher than the number of batches, the parameters (and therefore validation loss) don't change at all:
learn = synth_learner()
learn.fit(1, lr=0.01, cbs=GradientAccumulation(n_acc=1000))
# ensure valid_loss didn't change
assert learn.recorder.values[-1][1] == learn.recorder.values[0][1]
Normally if we use a learning rate that is too high, our training will diverge. This even happens if we use mixed precision training, which avoid infinities by using dynamic loss scaling, but still diverges:
fp16 = MixedPrecision()
set_seed(99)
learn = synth_learner(lr=1.1, cuda=True)
learn.fit(3, cbs=fp16)
By adding the GradientClip
callback, the gradient norm_type
(default:2) norm is clipped to at most max_norm
(default:1) using nn.utils.clip_grad_norm_
, which can avoid loss divergence:
set_seed(99)
learn = synth_learner(lr=1.1, cuda=True)
learn.fit(3, cbs=[GradientClip,fp16])
BnFreeze
is useful when you'd like to train two separate models that have a common feature extractor / body. The only part of the model that's different is the head that you attach for transfer learning.
Learner.freeze()
) doesn't suffice here as the BatchNorm
layers are trainable by default, and running mean and std of batches are tracked. For feature extractors to fully match, you need to set train_bn=False
and these stats need to be frozen as well, which is precisely the function of BnFreeze
.
path = untar_data(URLs.MNIST_TINY)
dls = ImageDataLoaders.from_folder(path, valid_pct=0.2)
https://pytorch.org/tutorials/intermediate/memory_format_tutorial.htmlWe first demonstrate the mismatch of the running stats when using only train_bn=False
, by creating a Learner
...:
learn1 = vision_learner(deepcopy(dls), resnet18, pretrained=True, train_bn=False)
...and grab the first BatchNorm
layer, and store its running mean:
m = learn1.model[0][1].running_mean.clone()
You can see that now that running mean has changed:
learn1.fit(1, lr=0.02)
test_ne(to_detach(learn1.model[0][1].running_mean), m)
When we use the BnFreeze
callback, the running statistics will not be changed during training. This is often important for getting good results from transfer learning.
learn1 = vision_learner(deepcopy(dls), resnet18, pretrained=True, train_bn=False, cbs=BnFreeze)
m = learn1.model[0][1].running_mean.detach().clone()
learn1.fit(1, lr=0.02)
test_eq(to_detach(learn1.model[0][1].running_mean), m)
Channels Last
[Beta] A simple callback to use channels last memory format
This callback sets your model in channels_last
memory format before training. This will enable speed ups in modern GPUs. You can read more about this beta feature on the PyTorch docs.
learn1 = vision_learner(deepcopy(dls), resnet18, pretrained=True, cbs=ChannelsLast())
learn1.fit(1, lr=0.02)
timm
models