#from accelerate.utils import write_basic_config
#write_basic_config()
Notebook distributed training
Accelerate
to launch a training script from your notebook
Overview
In this tutorial we will see how to use Accelerate to launch a training function on a distributed system, from inside your notebook!
To keep it easy, this example will follow training PETs, showcasing how all it takes is 3 new lines of code to be on your way!
Setting up imports and building the DataLoaders
First, make sure that Accelerate is installed on your system by running:
pip install accelerate -U
In your code, along with the normal from fastai.module.all import *
imports two new ones need to be added:
+ from fastai.distributed import *
from fastai.vision.all import *
from fastai.vision.models.xresnet import *
+ from accelerate import notebook_launcher
+ from accelerate.utils import write_basic_config
The first brings in the Learner.distrib_ctx
context manager. The second brings in Accelerate’s notebook_launcher, the key function we will call to run what we want.
We need to setup Accelerate
to use all of our GPUs. We can do so quickly with write_basic_config ()
:
Since this checks torch.cuda.device_count
, you will need to restart your notebook and skip calling this again to continue. It only needs to be ran once! Also if you choose not to use this run accelerate config
from the terminal and set mixed_precision
to no
Next let’s download some data to train on. You don’t need to worry about using rank0_first
, as since we’re in our Jupyter Notebook it will only run on one process like normal:
= untar_data(URLs.PETS) path
We wrap the creation of the DataLoaders
, our vision_learner
, and call to fine_tune
inside of a train
function.
It is important to not build the DataLoaders
outside of the function, as absolutely nothing can be loaded onto CUDA beforehand.
def get_y(o): return o[0].isupper()
def train(path):
= ImageDataLoaders.from_name_func(
dls =0.2,
path, get_image_files(path), valid_pct=get_y, item_tfms=Resize(224))
label_func= vision_learner(dls, resnet34, metrics=error_rate).to_fp16()
learn 1) learn.fine_tune(
The last addition to the train
function needed is to use our context manager before calling fine_tune
and setting in_notebook
to True
:
for this example sync_bn
is disabled for compatibility purposes with torchvision
’s resnet34
def train(path):
= ImageDataLoaders.from_name_func(
dls =0.2,
path, get_image_files(path), valid_pct=get_y, item_tfms=Resize(224))
label_func= vision_learner(dls, resnet34, metrics=error_rate).to_fp16()
learn with learn.distrib_ctx(sync_bn=False, in_notebook=True):
1)
learn.fine_tune("pets") learn.export(
Finally, just call notebook_launcher
, passing in the training function, any arguments as a tuple, and the number of GPUs (processes) to use:
=2) notebook_launcher(train, (path,), num_processes
Launching training on 2 GPUs.
Training Learner...
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.342019 | 0.228441 | 0.105041 | 00:54 |
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.197188 | 0.141764 | 0.062246 | 00:56 |
Afterwards we can import our exported Learner
, save, or anything else we may want to do in our Jupyter Notebook outside of a distributed process
= get_image_files(path)
imgs = load_learner(path/'pets')
learn 0]) learn.predict(imgs[
('False', TensorBase(0), TensorBase([0.9718, 0.0282]))