# Training callbacks ------------------------------------------------------------------------ source ### ShortEpochCallback > ShortEpochCallback (pct=0.01, short_valid=True) *Fit just `pct` of an epoch, then stop* ``` python learn = synth_learner() learn.fit(1, cbs=ShortEpochCallback()) ```

epoch	train_loss	valid_loss	time
0	00:00

``` python learn = synth_learner() learn.fit(1, cbs=ShortEpochCallback(short_valid=False)) ```

epoch	train_loss	valid_loss	time
0	8.432135	00:00

------------------------------------------------------------------------ source ### GradientAccumulation > GradientAccumulation (n_acc=32) *Accumulate gradients before updating weights* When the number of steps per accumulation is higher than the number of batches, the parameters (and therefore validation loss) don’t change at all: ``` python learn = synth_learner() learn.fit(1, lr=0.01, cbs=GradientAccumulation(n_acc=1000)) # ensure valid_loss didn't change assert learn.recorder.values[-1][1] == learn.recorder.values[0][1] ```

epoch	train_loss	valid_loss	time
0	20.987558	26.849480	00:00

------------------------------------------------------------------------ source ### GradientClip > GradientClip (max_norm:float=1.0, norm_type:float=2.0) *Clip norm of gradients* Normally if we use a learning rate that is too high, our training will diverge. This even happens if we use mixed precision training, which avoid infinities by using dynamic loss scaling, but still diverges: ``` python fp16 = MixedPrecision() ``` ``` python set_seed(99) learn = synth_learner(lr=1.1, cuda=True) learn.fit(3, cbs=fp16) ```

epoch	train_loss	valid_loss	time
0	38.214138	25.269005	00:00
1	377.145508	890.010376	00:00
2	839.392883	9965.747070	00:00

By adding the [`GradientClip`](https://docs.fast.ai/callback.training.html#gradientclip) callback, the gradient `norm_type` (default:2) norm is clipped to at most `max_norm` (default:1) using `nn.utils.clip_grad_norm_`, which can avoid loss divergence: ``` python set_seed(99) learn = synth_learner(lr=1.1, cuda=True) learn.fit(3, cbs=[GradientClip,fp16]) ```

epoch	train_loss	valid_loss	time
0	2.039428	2.372177	00:00
1	1.402425	0.300728	00:00
2	1.013548	0.332610	00:00

## BnFreeze ------------------------------------------------------------------------ source ### BnFreeze > BnFreeze (after_create=None, before_fit=None, before_epoch=None, > before_train=None, before_batch=None, after_pred=None, > after_loss=None, before_backward=None, > after_cancel_backward=None, after_backward=None, > before_step=None, after_cancel_step=None, after_step=None, > after_cancel_batch=None, after_batch=None, > after_cancel_train=None, after_train=None, > before_validate=None, after_cancel_validate=None, > after_validate=None, after_cancel_epoch=None, after_epoch=None, > after_cancel_fit=None, after_fit=None) *Basic class handling tweaks of the training loop by changing a [`Learner`](https://docs.fast.ai/learner.html#learner) in various events* ------------------------------------------------------------------------ source ### set_bn_eval > set_bn_eval (m:torch.nn.modules.module.Module, use_eval=True) *Set bn layers in eval mode for all recursive children of `m`.* [`BnFreeze`](https://docs.fast.ai/callback.training.html#bnfreeze) is useful when you’d like to train two separate models that have a common feature extractor / body. The only part of the model that’s different is the head that you attach for transfer learning.
[`Learner.freeze()`](https://docs.fast.ai/learner.html#learner.freeze) doesn’t suffice here as the [`BatchNorm`](https://docs.fast.ai/layers.html#batchnorm) layers are trainable by default, and running mean and std of batches are tracked. For feature extractors to fully match, you need to set `train_bn=False` and these stats need to be frozen as well, which is precisely the function of [`BnFreeze`](https://docs.fast.ai/callback.training.html#bnfreeze). ``` python path = untar_data(URLs.MNIST_TINY) dls = ImageDataLoaders.from_folder(path, valid_pct=0.2) ``` https://pytorch.org/tutorials/intermediate/memory_format_tutorial.htmlWe first demonstrate the mismatch of the running stats when using only `train_bn=False`, by creating a [`Learner`](https://docs.fast.ai/learner.html#learner)…: ``` python learn1 = vision_learner(deepcopy(dls), resnet18, pretrained=True, train_bn=False) ``` …and grab the first [`BatchNorm`](https://docs.fast.ai/layers.html#batchnorm) layer, and store its running mean: ``` python m = learn1.model[0][1].running_mean.clone() ``` You can see that now that running mean has changed: ``` python learn1.fit(1, lr=0.02) test_ne(to_detach(learn1.model[0][1].running_mean), m) ```

epoch	train_loss	valid_loss	time
0	1.148303	0.739404	00:12

When we use the [`BnFreeze`](https://docs.fast.ai/callback.training.html#bnfreeze) callback, the running statistics will not be changed during training. This is often important for getting good results from transfer learning. ``` python learn1 = vision_learner(deepcopy(dls), resnet18, pretrained=True, train_bn=False, cbs=BnFreeze) m = learn1.model[0][1].running_mean.detach().clone() learn1.fit(1, lr=0.02) test_eq(to_detach(learn1.model[0][1].running_mean), m) ```

epoch	train_loss	valid_loss	time
0	0.478594	0.270772	00:10