Vision learner

All the functions necessary to build Learner suitable for transfer learning in computer vision

The most important functions of this module are vision_learner and unet_learner. They will help you define a Learner using a pretrained model. See the vision tutorial for examples of use.

Cut a pretrained model

By default, the fastai library cuts a pretrained model at the pooling layer. This function helps detecting it.


source

has_pool_type

 has_pool_type (m)

Return True if m is a pooling layer or has one in its children

m = nn.Sequential(nn.AdaptiveAvgPool2d(5), nn.Linear(2,3), nn.Conv2d(2,3,1), nn.MaxPool3d(5))
assert has_pool_type(m)
test_eq([has_pool_type(m_) for m_ in m.children()], [True,False,False,True])

source

cut_model

 cut_model (model, cut)

Cut an instantiated model


source

create_body

 create_body (model, n_in=3, pretrained=True, cut=None)

Cut off the body of a typically pretrained arch as determined by cut

cut can either be an integer, in which case we cut the model at the corresponding layer, or a function, in which case, this function returns cut(model). It defaults to the first layer that contains some pooling otherwise.

def tst(): return nn.Sequential(nn.Conv2d(3,5,3), nn.BatchNorm2d(5), nn.AvgPool2d(1), nn.Linear(3,4))
m = create_body(tst())
test_eq(len(m), 2)

m = create_body(tst(), cut=3)
test_eq(len(m), 3)

m = create_body(tst(), cut=noop)
test_eq(len(m), 4)

for n in range(1,5):    
    m = create_body(tst(), n_in=n)
    test_eq(_get_first_layer(m)[0].in_channels, n)

Head and model


source

create_head

 create_head (nf, n_out, lin_ftrs=None, ps=0.5, pool=True,
              concat_pool=True, first_bn=True, bn_final=False,
              lin_first=False, y_range=None)

Model head that takes nf features, runs through lin_ftrs, and out n_out classes.

The head begins with fastai’s AdaptiveConcatPool2d if concat_pool=True otherwise, it uses traditional average pooling. Then it uses a Flatten layer before going on blocks of BatchNorm, Dropout and Linear layers (if lin_first=True, those are Linear, BatchNorm, Dropout).

Those blocks start at nf, then every element of lin_ftrs (defaults to [512]) and end at n_out. ps is a list of probabilities used for the dropouts (if you only pass 1, it will use half the value then that value as many times as necessary).

If first_bn=True, a BatchNorm added just after the pooling operations. If bn_final=True, a final BatchNorm layer is added. If y_range is passed, the function adds a SigmoidRange to that range.

tst = create_head(5, 10)
tst
Sequential(
  (0): AdaptiveConcatPool2d(
    (ap): AdaptiveAvgPool2d(output_size=1)
    (mp): AdaptiveMaxPool2d(output_size=1)
  )
  (1): fastai.layers.Flatten(full=False)
  (2): BatchNorm1d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (3): Dropout(p=0.25, inplace=False)
  (4): Linear(in_features=10, out_features=512, bias=False)
  (5): ReLU(inplace=True)
  (6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (7): Dropout(p=0.5, inplace=False)
  (8): Linear(in_features=512, out_features=10, bias=False)
)
#TODO: refactor, i.e. something like this?
# class ModelSplitter():
#     def __init__(self, idx): self.idx = idx
#     def split(self, m): return L(m[:self.idx], m[self.idx:]).map(params)
#     def __call__(self,): return {'cut':self.idx, 'split':self.split}

source

default_split

 default_split (m)

Default split of a model between body and head

To do transfer learning, you need to pass a splitter to Learner. This should be a function taking the model and returning a collection of parameter groups, e.g. a list of list of parameters.


source

add_head

 add_head (body, nf, n_out, init=<function kaiming_normal_>, head=None,
           concat_pool=True, pool=True, lin_ftrs=None, ps=0.5,
           first_bn=True, bn_final=False, lin_first=False, y_range=None)

Add a head to a vision body


source

create_vision_model

 create_vision_model (arch, n_out, pretrained=True, weights=None,
                      cut=None, n_in=3, init=<function kaiming_normal_>,
                      custom_head=None, concat_pool=True, pool=True,
                      lin_ftrs=None, ps=0.5, first_bn=True,
                      bn_final=False, lin_first=False, y_range=None)

Create custom vision architecture

The model is cut according to cut and it may be pretrained, in which case, the proper set of weights is downloaded then loaded. init is applied to the head of the model, which is either created by create_head (with lin_ftrs, ps, concat_pool, bn_final, lin_first and y_range) or is custom_head.

tst = create_vision_model(models.resnet18, 10, True)
tst = create_vision_model(models.resnet18, 10, True, n_in=1)

source

TimmBody

 TimmBody (model, pretrained:bool=True, cut=None, n_in:int=3)

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool


source

create_timm_model

 create_timm_model (arch, n_out, cut=None, pretrained=True, n_in=3,
                    init=<function kaiming_normal_>, custom_head=None,
                    concat_pool=True, pool=True, lin_ftrs=None, ps=0.5,
                    first_bn=True, bn_final=False, lin_first=False,
                    y_range=None, **kwargs)

Create custom architecture using arch, n_in and n_out from the timm library

# make sure that timm models can be scripted:
tst, _ = create_timm_model('resnet34', 1)
scripted = torch.jit.script(tst)
assert scripted, "model could not be converted to TorchScript"

Learner convenience functions


source

vision_learner

 vision_learner (dls, arch, normalize=True, n_out=None, pretrained=True,
                 weights=None, loss_func=None, opt_func=<function Adam>,
                 lr=0.001, splitter=None, cbs=None, metrics=None,
                 path=None, model_dir='models', wd=None, wd_bn_bias=False,
                 train_bn=True, moms=(0.95, 0.85, 0.95), cut=None,
                 init=<function kaiming_normal_>, custom_head=None,
                 concat_pool=True, pool=True, lin_ftrs=None, ps=0.5,
                 first_bn=True, bn_final=False, lin_first=False,
                 y_range=None, n_in=3)

Build a vision learner from dls and arch

Type Default Details
dls
arch
normalize bool True
n_out NoneType None
pretrained bool True
weights NoneType None
loss_func NoneType None
opt_func function Adam
lr float 0.001
splitter NoneType None
cbs NoneType None
metrics NoneType None
path NoneType None learner args
model_dir str models
wd NoneType None
wd_bn_bias bool False
train_bn bool True
moms tuple (0.95, 0.85, 0.95)
cut NoneType None
init function kaiming_normal_
custom_head NoneType None
concat_pool bool True
pool bool True model & head args
lin_ftrs NoneType None
ps float 0.5
first_bn bool True
bn_final bool False
lin_first bool False
y_range NoneType None
n_in int 3

The model is built from arch using the number of final activations inferred from dls if possible (otherwise pass a value to n_out). It might be pretrained and the architecture is cut and split using the default metadata of the model architecture (this can be customized by passing a cut or a splitter).

If normalize and pretrained are True, this function adds a Normalization transform to the dls (if there is not already one) using the statistics of the pretrained model. That way, you won’t ever forget to normalize your data in transfer learning.

All other arguments are passed to Learner.

Starting with version 0.13, TorchVision supports multiple pretrained weights for the same model architecture. The vision_learner default of pretrained=True, weights=None will use the architecture’s default weights, which are currently IMAGENET1K_V2. If you are using an older version of TorchVision or creating a timm model, setting weights will have no effect.

from torchvision.models import ResNet50_Weights

# Legacy weights with accuracy 76.130%
vision_learner(models.resnet50, pretrained=True, weights=ResNet50_Weights.IMAGENET1K_V1, ...)

# New weights with accuracy 80.858%. Strings are also supported.
vision_learner(models.resnet50, pretrained=True, weights='IMAGENET1K_V2', ...)

# Best available weights (currently an alias for IMAGENET1K_V2).
# Default weights if vision_learner weights isn't set.
vision_learner(models.resnet50, pretrained=True, weights=ResNet50_Weights.DEFAULT, ...)

# No weights - random initialization
vision_learner(models.resnet50, pretrained=False, weights=None, ...)

The example above shows how to use the new TorchVision 0.13 multi-weight api with vision_learner.

path = untar_data(URLs.PETS)
fnames = get_image_files(path/"images")
pat = r'^(.*)_\d+.jpg$'
dls = ImageDataLoaders.from_name_re(path, fnames, pat, item_tfms=Resize(224))
learn = vision_learner(dls, models.resnet18, loss_func=CrossEntropyLossFlat(), ps=0.25)

If you pass a str to arch, then a timm model will be created:

dls = ImageDataLoaders.from_name_re(path, fnames, pat, item_tfms=Resize(224))
learn = vision_learner(dls, 'convnext_tiny', loss_func=CrossEntropyLossFlat(), ps=0.25)

source

create_unet_model

 create_unet_model (arch, n_out, img_size, pretrained=True, weights=None,
                    cut=None, n_in=3, blur=False, blur_final=True,
                    self_attention=False, y_range=None, last_cross=True,
                    bottle=False, act_cls=<class
                    'torch.nn.modules.activation.ReLU'>, init=<function
                    kaiming_normal_>, norm_type=None)

Create custom unet architecture

tst = create_unet_model(models.resnet18, 10, (24,24), True, n_in=1)

source

unet_learner

 unet_learner (dls, arch, normalize=True, n_out=None, pretrained=True,
               weights=None, config=None, loss_func=None,
               opt_func=<function Adam>, lr=0.001, splitter=None,
               cbs=None, metrics=None, path=None, model_dir='models',
               wd=None, wd_bn_bias=False, train_bn=True, moms=(0.95, 0.85,
               0.95), cut=None, n_in=3, blur=False, blur_final=True,
               self_attention=False, y_range=None, last_cross=True,
               bottle=False, act_cls=<class
               'torch.nn.modules.activation.ReLU'>, init=<function
               kaiming_normal_>, norm_type=None)

Build a unet learner from dls and arch

Type Default Details
dls
arch
normalize bool True
n_out NoneType None
pretrained bool True
weights NoneType None
config NoneType None
loss_func NoneType None
opt_func function Adam
lr float 0.001
splitter NoneType None
cbs NoneType None
metrics NoneType None
path NoneType None learner args
model_dir str models
wd NoneType None
wd_bn_bias bool False
train_bn bool True
moms tuple (0.95, 0.85, 0.95)
cut NoneType None
n_in int 3
blur bool False
blur_final bool True
self_attention bool False
y_range NoneType None
last_cross bool True
bottle bool False
act_cls type ReLU
init function kaiming_normal_
norm_type NoneType None

The model is built from arch using the number of final filters inferred from dls if possible (otherwise pass a value to n_out). It might be pretrained and the architecture is cut and split using the default metadata of the model architecture (this can be customized by passing a cut or a splitter).

If normalize and pretrained are True, this function adds a Normalization transform to the dls (if there is not already one) using the statistics of the pretrained model. That way, you won’t ever forget to normalize your data in transfer learning.

All other arguments are passed to Learner.

unet_learner also supports TorchVision’s new multi-weight API via weights. See vision_learner for more details.

path = untar_data(URLs.CAMVID_TINY)
fnames = get_image_files(path/'images')
def label_func(x): return path/'labels'/f'{x.stem}_P{x.suffix}'
codes = np.loadtxt(path/'codes.txt', dtype=str)
dls = SegmentationDataLoaders.from_label_func(path, fnames, label_func, codes=codes)
learn = unet_learner(dls, models.resnet34, loss_func=CrossEntropyLossFlat(axis=1), y_range=(0,1))

source

create_cnn_model

 create_cnn_model (*args, **kwargs)

Deprecated name for create_vision_model – do not use


source

cnn_learner

 cnn_learner (*args, **kwargs)

Deprecated name for vision_learner – do not use