Implementation of the callback system

Classes for callback implementors

fastai provides a powerful callback system, which is documented on the callbacks page; look on that page if you're just looking for how to use existing callbacks. If you want to create your own, you'll need to use the classes discussed below.

A key motivation for the callback system is that additional functionality can be entirely implemented in a single callback, so that it's easily read. By using this trick, we will have different methods categorized in different callbacks where we will find clearly stated all the interventions the method makes in training. For instance in the LRFinder callback, on top of running the fit function with exponentially growing LRs, it needs to handle some preparation and clean-up, and all this code can be in the same callback so we know exactly what it is doing and where to look if we need to change something.

In addition, it allows our fit function to be very clean and simple, yet still easily extended. So far in implementing a number of recent papers, we haven't yet come across any situation where we had to modify our training loop source code - we've been able to use callbacks every time.

class Callback[source]

Callback()

Base class for callbacks that want to record values, dynamically change learner params, etc.

To create a new type of callback, you'll need to inherit from this class, and implement one or more methods as required for your purposes. Perhaps the easiest way to get started is to look at the source code for some of the pre-defined fastai callbacks. You might be surprised at how simple they are! For instance, here is the entire source code for GradientClipping:

@dataclass
class GradientClipping(LearnerCallback):
    clip:float
    def on_backward_end(self, **kwargs):
        if self.clip:
            nn.utils.clip_grad_norm_(self.learn.model.parameters(), self.clip)

You generally want your custom callback constructor to take a Learner parameter, e.g.:

@dataclass
class MyCallback(Callback):
    learn:Learner

Note that this allows the callback user to just pass your callback name to callback_fns when constructing their Learner, since that always passes self when constructing callbacks from callback_fns. In addition, by passing the learner, this callback will have access to everything: e.g all the inputs/outputs as they are calculated, the losses, and also the data loaders, the optimizer, etc. At any time:

  • Changing self.learn.data.train_dl or self.data.valid_dl will change them inside the fit function (we just need to pass the DataBunch object to the fit function and not data.train_dl/data.valid_dl)
  • Changing self.learn.opt.opt (We have an OptimWrapper on top of the actual optimizer) will change it inside the fit function.
  • Changing self.learn.data or self.learn.opt directly WILL NOT change the data or the optimizer inside the fit function.

In any of the callbacks you can unpack in the kwargs:

  • n_epochs, contains the number of epochs the training will take in total
  • epoch, contains the number of the current
  • iteration, contains the number of iterations done since the beginning of training
  • num_batch, contains the number of the batch we're at in the dataloader
  • last_input, contains the last input that got through the model (eventually updated by a callback)
  • last_target, contains the last target that gor through the model (eventually updated by a callback)
  • last_output, contains the last output spitted by the model (eventually updated by a callback)
  • last_loss, contains the last loss computed (eventually updated by a callback)
  • smooth_loss, contains the smoothed version of the loss
  • last_metrics, contains the last validation loss and emtrics computed
  • pbar, the progress bar

Methods your subclass can implement

All of these methods are optional; your subclass can handle as many or as few as you require.

on_train_begin[source]

on_train_begin(kwargs:Any)

To initialize constants in the callback.

Here we can initiliaze anything we need. The optimizer has now been initialized. We can change any hyper-parameters by typing, for instance:

self.opt.lr = new_lr
self.opt.mom = new_mom
self.opt.wd = new_wd
self.opt.beta = new_beta

on_epoch_begin[source]

on_epoch_begin(kwargs:Any)

At the beginning of each epoch.

This is not technically required since we have on_train_begin for epoch 0 and on_epoch_end for all the other epochs, yet it makes writing code that needs to be done at the beginning of every epoch easy and more readable.

on_batch_begin[source]

on_batch_begin(kwargs:Any)

Set HP before the step is done. Returns xb, yb (which can allow us to modify the input at that step if needed).

Here is the perfect place to prepare everything before the model is called. Example: change the values of the hyperparameters (if we don't do it on_batch_end instead)

If we return something, that will be the new value for xb,yb.

on_loss_begin[source]

on_loss_begin(kwargs:Any)

Called after forward pass but before loss has been computed. Returns the output (which can allow us to modify it).

Here is the place to run some code that needs to be executed after the output has been computed but before the loss computation. Example: putting the output back in FP32 when training in mixed precision.

If we return something, that will be the new value for the output.

on_backward_begin[source]

on_backward_begin(kwargs:Any)

Called after the forward pass and the loss has been computed, but before backprop. Returns the loss (which can allow us to modify it, for instance for reg functions)

Here is the place to run some code that needs to be executed after the loss has been computed but before the gradient computation. Example: reg_fn in RNNs.

If we return something, that will be the new value for loss. Since the recorder is always called first, it will have the raw loss.

on_backward_end[source]

on_backward_end(kwargs:Any)

Called after backprop but before optimizer step. Useful for true weight decay in AdamW.

Here is the place to run some code that needs to be executed after the gradients have been computed but before the optimizer is called.

on_step_end[source]

on_step_end(kwargs:Any)

Called after the step of the optimizer but before the gradients are zeroed.

Here is the place to run some code that needs to be executed after the optimizer step but before the gradients are zeroed

on_batch_end[source]

on_batch_end(kwargs:Any)

Called at the end of the batch.

Here is the place to run some code that needs to be executed after a batch is fully done. Example: change the values of the hyperparameters (if we don't do it on_batch_begin instead)

If we return true, the current epoch is interrupted (example: lr_finder stops the training when the loss explodes)

on_epoch_end[source]

on_epoch_end(kwargs:Any) → bool

Called at the end of an epoch.

Here is the place to run some code that needs to be executed at the end of an epoch. Example: Save the model if we have a new best validation loss/metric.

If we return true, the training stops (example: early stopping)

on_train_end[source]

on_train_end(kwargs:Any)

Useful for cleaning up things and saving files/models.

Here is the place to tidy everything. It's always executed even if there was an error during the training loop, and has an extra kwarg named exception to check if there was an exception or not. Examples: save log_files, load best model found during training

Annealing functions

The following functions provide different annealing schedules. You probably won't need to call them directly, but would instead use them as part of a callback. Here's what each one looks like:

annealing_cos[source]

annealing_cos(start:Number, end:Number, pct:float) → Number

Cosine anneal from start to end as pct goes from 0.0 to 1.0.

annealing_exp[source]

annealing_exp(start:Number, end:Number, pct:float) → Number

Exponentially anneal from start to end as pct goes from 0.0 to 1.0.

annealing_linear[source]

annealing_linear(start:Number, end:Number, pct:float) → Number

Linearly anneal from start to end as pct goes from 0.0 to 1.0.

annealing_no[source]

annealing_no(start:Number, end:Number, pct:float) → Number

No annealing, always return start.

annealing_poly[source]

annealing_poly(degree:Number) → Number

Anneal polynomically from start to end as pct goes from 0.0 to 1.0.

class CallbackHandler[source]

CallbackHandler(callbacks:Collection[Callback]=None, metrics:Collection[Callback]=None, beta:float=0.98)

Manage all of the registered callback objects, smoothing loss by momentum beta.

You probably won't need to use this class yourself. It's used by fastai to combine all the callbacks together and call any relevant callback functions for each training stage. The methods below simply call the equivalent method in each callback function in self.callbacks.

on_backward_begin[source]

on_backward_begin(loss:Tensor)

Handle gradient calculation on loss.

on_backward_end[source]

on_backward_end()

Handle end of gradient calculation.

on_batch_begin[source]

on_batch_begin(xb:Tensor, yb:Tensor, train:bool=True)

Handle new batch xb,yb.

on_batch_end[source]

on_batch_end(loss:Tensor)

Handle end of processing one batch with loss.

on_epoch_begin[source]

on_epoch_begin()

Handle new epoch.

on_epoch_end[source]

on_epoch_end(val_loss:Tensor) → bool

Epoch is done, process val_metrics.

on_loss_begin[source]

on_loss_begin(out:Tensor)

Handle start of loss calculation with model output out.

on_step_end[source]

on_step_end()

Handle end of optimization step.

on_train_begin[source]

on_train_begin(epochs:int, pbar:PBar, metrics:MetricFuncList)

About to start learning.

on_train_end[source]

on_train_end(exception:Union[bool, Exception])

Handle end of training, exception is an Exception or False if no exceptions during training.

class OptimWrapper[source]

OptimWrapper(opt:Optimizer, wd:Floats=0.0, true_wd:bool=False, bn_wd:bool=True)

Basic wrapper around an optimizer to simplify HP changes.

This is a convenience class that provides a consistent API for getting and setting optimizer hyperparameters. For instance, for optim.Adam the momentum parameter is actually betas[0], whereas for optim.SGD it's simply momentum. As another example, the details of handling weight decay depend on whether you are using true_wd or the traditional L2 regularization approach.

This class also handles setting different WD and LR for each layer group, for discriminative layer training.

create[source]

create(opt_func:Callable, lr:Union[float, Tuple, List], layer_groups:ModuleList, kwargs:Any) → Optimizer

Create an optim.Optimizer from opt_func with lr. Set lr on layer_groups.

read_defaults[source]

read_defaults()

Read the values inside the optimizer for the hyper-parameters.

read_val[source]

read_val(key:str) → Union[List[float], Tuple[List[float], List[float]]]

Read a hyperparameter key in the optimizer dictionary.

set_val[source]

set_val(key:str, val:Any, bn_groups:bool=True) → Any

Set the values inside the optimizer dictionary at the key.

step[source]

step()

Set weight decay and step optimizer.

zero_grad[source]

zero_grad()

Clear optimizer gradients.

class SmoothenValue[source]

SmoothenValue(beta:float)

Create a smooth moving average for a value (loss, etc).

Used for smoothing loss in Recorder.

add_value[source]

add_value(val:float)

Add current value to calculate updated smoothed value.

class Stepper[source]

Stepper(vals:StartOptEnd, n_iter:int, func:Optional[AnnealFunc]=None)

Used to "step" from start,end (vals) over n_iter iterations on a schedule defined by func

Used for creating annealing schedules, mainly for OneCycleScheduler.

step[source]

step() → Number

Return next value along annealed schedule.

class AverageMetric[source]

AverageMetric(func) :: Callback

Wrap a func in a callback for metrics computation.

See the documentation on metrics for more information.