Vision learner

All the functions necessary to build Learner suitable for transfer learning in computer vision

The most important functions of this module are vision_learner and unet_learner. They will help you define a Learner using a pretrained model. See the vision tutorial for examples of use.

Cut a pretrained model

By default, the fastai library cuts a pretrained model at the pooling layer. This function helps detecting it.

source

has_pool_type

 has_pool_type (m)

Return True if m is a pooling layer or has one in its children

m = nn.Sequential(nn.AdaptiveAvgPool2d(5), nn.Linear(2,3), nn.Conv2d(2,3,1), nn.MaxPool3d(5))
assert has_pool_type(m)
test_eq([has_pool_type(m_) for m_ in m.children()], [True,False,False,True])

source

cut_model

 cut_model (model, cut)

Cut an instantiated model

source

create_body

 create_body (model, n_in=3, pretrained=True, cut=None)

Cut off the body of a typically pretrained arch as determined by cut

cut can either be an integer, in which case we cut the model at the corresponding layer, or a function, in which case, this function returns cut(model). It defaults to the first layer that contains some pooling otherwise.

def tst(): return nn.Sequential(nn.Conv2d(3,5,3), nn.BatchNorm2d(5), nn.AvgPool2d(1), nn.Linear(3,4))
m = create_body(tst())
test_eq(len(m), 2)

m = create_body(tst(), cut=3)
test_eq(len(m), 3)

m = create_body(tst(), cut=noop)
test_eq(len(m), 4)

for n in range(1,5):    
    m = create_body(tst(), n_in=n)
    test_eq(_get_first_layer(m)[0].in_channels, n)

Head and model

source

create_head

 create_head (nf, n_out, lin_ftrs=None, ps=0.5, pool=True,
              concat_pool=True, first_bn=True, bn_final=False,
              lin_first=False, y_range=None)

Model head that takes nf features, runs through lin_ftrs, and out n_out classes.

The head begins with fastai’s AdaptiveConcatPool2d if concat_pool=True otherwise, it uses traditional average pooling. Then it uses a Flatten layer before going on blocks of BatchNorm, Dropout and Linear layers (if lin_first=True, those are Linear, BatchNorm, Dropout).

Those blocks start at nf, then every element of lin_ftrs (defaults to [512]) and end at n_out. ps is a list of probabilities used for the dropouts (if you only pass 1, it will use half the value then that value as many times as necessary).

If first_bn=True, a BatchNorm added just after the pooling operations. If bn_final=True, a final BatchNorm layer is added. If y_range is passed, the function adds a SigmoidRange to that range.

tst = create_head(5, 10)
tst

Sequential(
  (0): AdaptiveConcatPool2d(
    (ap): AdaptiveAvgPool2d(output_size=1)
    (mp): AdaptiveMaxPool2d(output_size=1)
  )
  (1): fastai.layers.Flatten(full=False)
  (2): BatchNorm1d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (3): Dropout(p=0.25, inplace=False)
  (4): Linear(in_features=10, out_features=512, bias=False)
  (5): ReLU(inplace=True)
  (6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (7): Dropout(p=0.5, inplace=False)
  (8): Linear(in_features=512, out_features=10, bias=False)
)

#TODO: refactor, i.e. something like this?
# class ModelSplitter():
#     def __init__(self, idx): self.idx = idx
#     def split(self, m): return L(m[:self.idx], m[self.idx:]).map(params)
#     def __call__(self,): return {'cut':self.idx, 'split':self.split}

source

default_split

 default_split (m)

Default split of a model between body and head

To do transfer learning, you need to pass a splitter to Learner. This should be a function taking the model and returning a collection of parameter groups, e.g. a list of list of parameters.

source

add_head

 add_head (body, nf, n_out, init=<function kaiming_normal_>, head=None,
           concat_pool=True, pool=True, lin_ftrs=None, ps=0.5,
           first_bn=True, bn_final=False, lin_first=False, y_range=None)

Add a head to a vision body

source

create_vision_model

 create_vision_model (arch, n_out, pretrained=True, weights=None,
                      cut=None, n_in=3, init=<function kaiming_normal_>,
                      custom_head=None, concat_pool=True, pool=True,
                      lin_ftrs=None, ps=0.5, first_bn=True,
                      bn_final=False, lin_first=False, y_range=None)

Create custom vision architecture

The model is cut according to cut and it may be pretrained, in which case, the proper set of weights is downloaded then loaded. init is applied to the head of the model, which is either created by create_head (with lin_ftrs, ps, concat_pool, bn_final, lin_first and y_range) or is custom_head.

tst = create_vision_model(models.resnet18, 10, True)
tst = create_vision_model(models.resnet18, 10, True, n_in=1)

source

TimmBody

 TimmBody (model, pretrained:bool=True, cut=None, n_in:int=3)

*Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing them to be nested in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will also have their parameters converted when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*

source

create_timm_model

 create_timm_model (arch, n_out, cut=None, pretrained=True, n_in=3,
                    init=<function kaiming_normal_>, custom_head=None,
                    concat_pool=True, pool=True, lin_ftrs=None, ps=0.5,
                    first_bn=True, bn_final=False, lin_first=False,
                    y_range=None, **kwargs)

Create custom architecture using arch, n_in and n_out from the timm library

# make sure that timm models can be scripted:
tst, _ = create_timm_model('resnet34', 1)
scripted = torch.jit.script(tst)
assert scripted, "model could not be converted to TorchScript"

`Learner` convenience functions

source

vision_learner

 vision_learner (dls, arch, normalize=True, n_out=None, pretrained=True,
                 weights=None, loss_func=None, opt_func=<function Adam>,
                 lr=0.001, splitter=None, cbs=None, metrics=None,
                 path=None, model_dir='models', wd=None, wd_bn_bias=False,
                 train_bn=True, moms=(0.95, 0.85, 0.95), cut=None,
                 init=<function kaiming_normal_>, custom_head=None,
                 concat_pool=True, pool=True, lin_ftrs=None, ps=0.5,
                 first_bn=True, bn_final=False, lin_first=False,
                 y_range=None, n_in=3)

Build a vision learner from dls and arch

	Type	Default	Details
dls
arch
normalize	bool	True
n_out	NoneType	None
pretrained	bool	True
weights	NoneType	None
loss_func	NoneType	None
opt_func	function	Adam
lr	float	0.001
splitter	NoneType	None
cbs	NoneType	None
metrics	NoneType	None
path	NoneType	None	learner args
model_dir	str	models
wd	NoneType	None
wd_bn_bias	bool	False
train_bn	bool	True
moms	tuple	(0.95, 0.85, 0.95)
cut	NoneType	None
init	function	kaiming_normal_
custom_head	NoneType	None
concat_pool	bool	True
pool	bool	True	model & head args
lin_ftrs	NoneType	None
ps	float	0.5
first_bn	bool	True
bn_final	bool	False
lin_first	bool	False
y_range	NoneType	None
n_in	int	3

The model is built from arch using the number of final activations inferred from dls if possible (otherwise pass a value to n_out). It might be pretrained and the architecture is cut and split using the default metadata of the model architecture (this can be customized by passing a cut or a splitter).

If normalize and pretrained are True, this function adds a Normalization transform to the dls (if there is not already one) using the statistics of the pretrained model. That way, you won’t ever forget to normalize your data in transfer learning.

All other arguments are passed to Learner.

Starting with version 0.13, TorchVision supports multiple pretrained weights for the same model architecture. The vision_learner default of pretrained=True, weights=None will use the architecture’s default weights, which are currently IMAGENET1K_V2. If you are using an older version of TorchVision or creating a timm model, setting weights will have no effect.

from torchvision.models import ResNet50_Weights

# Legacy weights with accuracy 76.130%
vision_learner(models.resnet50, pretrained=True, weights=ResNet50_Weights.IMAGENET1K_V1, ...)

# New weights with accuracy 80.858%. Strings are also supported.
vision_learner(models.resnet50, pretrained=True, weights='IMAGENET1K_V2', ...)

# Best available weights (currently an alias for IMAGENET1K_V2).
# Default weights if vision_learner weights isn't set.
vision_learner(models.resnet50, pretrained=True, weights=ResNet50_Weights.DEFAULT, ...)

# No weights - random initialization
vision_learner(models.resnet50, pretrained=False, weights=None, ...)

The example above shows how to use the new TorchVision 0.13 multi-weight api with vision_learner.

path = untar_data(URLs.PETS)
fnames = get_image_files(path/"images")
pat = r'^(.*)_\d+.jpg$'
dls = ImageDataLoaders.from_name_re(path, fnames, pat, item_tfms=Resize(224))

learn = vision_learner(dls, models.resnet18, loss_func=CrossEntropyLossFlat(), ps=0.25)

If you pass a str to arch, then a timm model will be created:

dls = ImageDataLoaders.from_name_re(path, fnames, pat, item_tfms=Resize(224))
learn = vision_learner(dls, 'convnext_tiny', loss_func=CrossEntropyLossFlat(), ps=0.25)

source

create_unet_model

 create_unet_model (arch, n_out, img_size, pretrained=True, weights=None,
                    cut=None, n_in=3, blur=False, blur_final=True,
                    self_attention=False, y_range=None, last_cross=True,
                    bottle=False, act_cls=<class
                    'torch.nn.modules.activation.ReLU'>, init=<function
                    kaiming_normal_>, norm_type=None)

Create custom unet architecture

tst = create_unet_model(models.resnet18, 10, (24,24), True, n_in=1)

source

unet_learner

 unet_learner (dls, arch, normalize=True, n_out=None, pretrained=True,
               weights=None, config=None, loss_func=None,
               opt_func=<function Adam>, lr=0.001, splitter=None,
               cbs=None, metrics=None, path=None, model_dir='models',
               wd=None, wd_bn_bias=False, train_bn=True, moms=(0.95, 0.85,
               0.95), cut=None, n_in=3, blur=False, blur_final=True,
               self_attention=False, y_range=None, last_cross=True,
               bottle=False, act_cls=<class
               'torch.nn.modules.activation.ReLU'>, init=<function
               kaiming_normal_>, norm_type=None)

Build a unet learner from dls and arch

	Type	Default	Details
dls
arch
normalize	bool	True
n_out	NoneType	None
pretrained	bool	True
weights	NoneType	None
config	NoneType	None
loss_func	NoneType	None
opt_func	function	Adam
lr	float	0.001
splitter	NoneType	None
cbs	NoneType	None
metrics	NoneType	None
path	NoneType	None	learner args
model_dir	str	models
wd	NoneType	None
wd_bn_bias	bool	False
train_bn	bool	True
moms	tuple	(0.95, 0.85, 0.95)
cut	NoneType	None
n_in	int	3
blur	bool	False
blur_final	bool	True
self_attention	bool	False
y_range	NoneType	None
last_cross	bool	True
bottle	bool	False
act_cls	type	ReLU
init	function	kaiming_normal_
norm_type	NoneType	None

The model is built from arch using the number of final filters inferred from dls if possible (otherwise pass a value to n_out). It might be pretrained and the architecture is cut and split using the default metadata of the model architecture (this can be customized by passing a cut or a splitter).

All other arguments are passed to Learner.

unet_learner also supports TorchVision’s new multi-weight API via weights. See vision_learner for more details.

path = untar_data(URLs.CAMVID_TINY)
fnames = get_image_files(path/'images')
def label_func(x): return path/'labels'/f'{x.stem}_P{x.suffix}'
codes = np.loadtxt(path/'codes.txt', dtype=str)
dls = SegmentationDataLoaders.from_label_func(path, fnames, label_func, codes=codes)

learn = unet_learner(dls, models.resnet34, loss_func=CrossEntropyLossFlat(axis=1), y_range=(0,1))

source

create_cnn_model

 create_cnn_model (*args, **kwargs)

Deprecated name for create_vision_model – do not use

source

cnn_learner

 cnn_learner (*args, **kwargs)

Deprecated name for vision_learner – do not use

Cut a pretrained model

has_pool_type

cut_model

create_body

Head and model

create_head

default_split

add_head

create_vision_model

TimmBody

create_timm_model

Learner convenience functions

vision_learner

create_unet_model

unet_learner

create_cnn_model

cnn_learner

`Learner` convenience functions