Using `Accelerate` to launch a training script from your notebook
 

Overview

In this tutorial we will see how to use Accelerate to launch a training function on a distributed system, from inside your notebook!

To keep it easy, this example will follow training PETs, showcasing how all it takes is 3 new lines of code to be on your way!

Setting up imports and building the DataLoaders

First, make sure that Accelerate is installed on your system by running:

pip install accelerate -U

In your code, along with the normal from fastai.module.all import * imports two new ones need to be added:

+ from fastai.distributed import *
from fastai.vision.all import *
from fastai.vision.models.xresnet import *

+ from accelerate import notebook_launcher
+ from accelerate.utils import write_basic_config

The first brings in the Learner.distrib_ctx context manager. The second brings in Accelerate's notebook_launcher, the key function we will call to run what we want.

We need to setup Accelerate to use all of our GPUs. We can do so quickly with write_basic_config ():

#write_basic_config()

Next let's download some data to train on. You don't need to worry about using rank0_first, as since we're in our Jupyter Notebook it will only run on one process like normal:

path = untar_data(URLs.PETS)

We wrap the creation of the DataLoaders, our vision_learner, and call to fine_tune inside of a train function.

def get_y(o): return o[0].isupper()
def train(path):
    dls = ImageDataLoaders.from_name_func(
        path, get_image_files(path), valid_pct=0.2,
        label_func=get_y, item_tfms=Resize(224))
    learn = vision_learner(dls, resnet34, metrics=error_rate).to_fp16()
    learn.fine_tune(1)

The last addition to the train function needed is to use our context manager before calling fine_tune and setting in_notebook to True:

def train(path):
    dls = ImageDataLoaders.from_name_func(
        path, get_image_files(path), valid_pct=0.2,
        label_func=get_y, item_tfms=Resize(224))
    learn = vision_learner(dls, resnet34, metrics=error_rate).to_fp16()
    with learn.distrib_ctx(sync_bn=False, in_notebook=True):
        learn.fine_tune(1)
    learn.export("pets")

Finally, just call notebook_launcher, passing in the training function, any arguments as a tuple, and the number of GPUs (processes) to use:

notebook_launcher(train, (path,), num_processes=2)
Launching training on 2 GPUs.
Training Learner...
epoch train_loss valid_loss error_rate time
0 0.342019 0.228441 0.105041 00:54
epoch train_loss valid_loss error_rate time
0 0.197188 0.141764 0.062246 00:56

Afterwards we can import our exported Learner, save, or anything else we may want to do in our Jupyter Notebook outside of a distributed process

imgs = get_image_files(path)
learn = load_learner(path/'pets')
learn.predict(imgs[0])
('False', TensorBase(0), TensorBase([0.9718, 0.0282]))