Basic functions using pytorch

Torch Core

This module contains all the basic functions we need in other modules of the fastai library (split with core that contains the ones not requiring pytorch). Its documentation can easily be skipped at a first read, unless you want to know what a given function does.

Global constants

AdamW = partial(optim.Adam, betas=(0.9,0.99))

bn_types = (nn.BatchNorm1d, nn.BatchNorm2d, nn.BatchNorm3d)

defaults.device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

If you are trying to make fastai run on the CPU, simply change the default device: defaults.device = 'cpu'.

Alternatively, if not using wildcard imports: fastai.torch_core.defaults.device = 'cpu'.

Functions that operate conversions


batch_to_half(b:Collection[Tensor]) → Collection[Tensor]

Set the input of batch b to half precision.



Flattens all the layers of m into an array. This allows for easy access to the layers of the model and allows you to manipulate the model as if it was an array.

m = simple_cnn([3,6,12])
  (0): Sequential(
    (0): Conv2d(3, 6, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): ReLU(inplace)
  (1): Sequential(
    (0): Conv2d(6, 12, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): ReLU(inplace)
  (2): Sequential(
    (0): AdaptiveAvgPool2d(output_size=1)
    (1): Flatten()
[Conv2d(3, 6, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)),
 Conv2d(6, 12, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)),


model2half(model:Module) → Module

Convert model to half precision except the batchnorm layers.

Converting model parameters to half precision allows us to leverage fast FP16 arithmetic which can speed up the computations by 2-8 times. It also reduces memory consumption allowing us to train deeper models.

Note: Batchnorm layers are not converted to half precision as that may lead to instability in training.

m = simple_cnn([3,6,12], bn=True)

def show_params_dtype(state_dict):
    """Simple function to pretty print the dtype of the model params"""
    for wt_name, param in state_dict.items():
        print("{:<30}: {}".format(wt_name, str(param.dtype)))

print("dtypes of model parameters before model2half: ")

# Converting model to half precision
m_half = model2half(m)

print("dtypes of model parameters after model2half: ")
dtypes of model parameters before model2half: 
0.0.weight                    : torch.float32
0.2.weight                    : torch.float32
0.2.bias                      : torch.float32
0.2.running_mean              : torch.float32
0.2.running_var               : torch.float32
0.2.num_batches_tracked       : torch.int64
1.0.weight                    : torch.float32
1.0.bias                      : torch.float32

dtypes of model parameters after model2half: 
0.0.weight                    : torch.float16
0.2.weight                    : torch.float32
0.2.bias                      : torch.float32
0.2.running_mean              : torch.float32
0.2.running_var               : torch.float32
0.2.num_batches_tracked       : torch.int64
1.0.weight                    : torch.float16
1.0.bias                      : torch.float16



Tranform numpy array a to a tensor of the same type.

It is a wrapper on top of Pytorch's torch.as_tensor which converts numpy array to torch tensor, and additionally attempts to map all floats to torch.float32 and all integers to torch.int64 for consistencies in model data. Below is an example demonstrating it's functionality for floating number, similar functionality applies to integer as well.

a1 = np.ones((2, 3)).astype(np.float16)
a2 = np.ones((2, 3)).astype(np.float32)
a3 = np.ones((2, 3)).astype(np.float64)

b1 = np2model_tensor(a1) # Maps to torch.float32
b2 = np2model_tensor(a2) # Maps to torch.float32
b3 = np2model_tensor(a3) # Maps to torch.float32

print(f"Datatype of as': {a1.dtype}, {a2.dtype}, {a3.dtype}")
print(f"Datatype of bs': {b1.dtype}, {b2.dtype}, {b3.dtype}")
Datatype of as': float16, float32, float64
Datatype of bs': torch.float32, torch.float32, torch.float32


requires_grad(m:Module, b:Optional[bool]=None) → Optional[bool]

If b is not set return requires_grad of first param, else set requires_grad on all params as b

Performs both getting and setting of requires_grad parameter of the tensors, which decided whether to accumulate gradients or not.

  • If b is None: The function gets the requires_grad for the model parameter, to be more specific it returns the requires_grad of the first element in the model.

  • Else if b is passed (a boolean value), requires_grad of all parameters of the model is set to b.

# Any Pytorch model
m = simple_cnn([3, 6, 12], bn=True)

# Get the requires_grad of model
print("requires_grad of model: {}".format(requires_grad(m)))

# Set requires_grad of all params in model to false
requires_grad(m, False)

# Get the requires_grad of model
print("requires_grad of model: {}".format(requires_grad(m)))
requires_grad of model: True
requires_grad of model: False


tensor(x:Any, *rest) → Tensor

Like torch.as_tensor, but handle lists too, and can pass multiple vector elements directly.

Handy function when you want to convert any list type object to tensor, initialize your weights manually, and other similar cases.

NB: When passing multiple vectors, all vectors must be of same dimensions. (Obvious but can be forgotten sometimes)

# Conversion from any numpy array
b = tensor(np.array([1, 2, 3]))
print(b, type(b))

# Passing as multiple parameters
b = tensor(1, 2, 3)
print(b, type(b))

# Passing a single list
b = tensor([1, 2, 3])
print(b, type(b))

# Can work with multiple vectors / lists
b = tensor([1, 2], [3, 4])
print(b, type(b))
tensor([1, 2, 3]) <class 'torch.Tensor'>
tensor([1, 2, 3]) <class 'torch.Tensor'>
tensor([1, 2, 3]) <class 'torch.Tensor'>
tensor([[1, 2],
        [3, 4]]) <class 'torch.Tensor'>



Recursively map lists of tensors in b to the cpu.

A wrapper on top of Pytorch's torch.Tensor.cpu() function, which creates and returns a copy of a tensor or even a list of tensors in the CPU. As described in Pytorch's docs, if the tensor or list of tensor is already on the CPU, the exact data is returned and no copy is made.

Useful to convert all the list of parameters of the model to CPU in a single call.

if torch.cuda.is_available():
    a = [torch.randn((1, 1)).cuda() for i in range(3)]
    print("Id of tensors in a: ")
    for i in a: print(id(i))
    # Getting a CPU version of the tensors in GPU
    b = to_cpu(a)
    print("Id of tensors in b:")
    for i in b: print(id(i))
    # Trying to perform to_cpu on a list of tensor already in CPU
    c = to_cpu(b)
    # The tensors in c has exact id as that of b. No copy performed.
    print("Id of tensors in c:")
    for i in c: print(id(i))
[tensor([[-0.5932]], device='cuda:0'), tensor([[-0.2867]], device='cuda:0'), tensor([[-1.0616]], device='cuda:0')]
Id of tensors in a: 
[tensor([[-0.5932]]), tensor([[-0.2867]]), tensor([[-1.0616]])]
Id of tensors in b:
[tensor([[-0.5932]]), tensor([[-0.2867]]), tensor([[-1.0616]])]
Id of tensors in c:



Recursively map lists of items in b to their wrapped data.

Returns the data attribute from the object or collection of objects that inherits from ItemBase class. Useful to examine the exact values of the data, could be used to work with the data outside of fastai classes.

# Default example examined

from fastai import *
from import *

path = untar_data(URLs.MNIST_SAMPLE)
data = ImageDataBunch.from_folder(path)

# Examin the labels
ys = list(data.y)
print("Category display names: ", [ys[0], ys[-1]])

print("Unique classes internally represented as: ", to_data([ys[0], ys[-1]]))
Category display names:  [Category 3, Category 7]
Unique classes internally represented as:  [0, 1]


to_detach(b:Tensors, cpu:bool=True)

Recursively detach lists of tensors in b; put them on the CPU if cpu=True.


to_device(b:Tensors, device:device)

Recursively put b on device.


to_half(b:Collection[Tensor]) → Collection[Tensor]

Recursively map lists of tensors in b to FP16.

Converts the tensor or list of FP16, resulting in less memory consumption and faster computations with the tensor. It does not convert types to half precision.

a1 = torch.tensor([1, 2], dtype=torch.int64)
a2 = torch.tensor([1, 2], dtype=torch.int32)
a3 = torch.tensor([1, 2], dtype=torch.int16)
a4 = torch.tensor([1, 2], dtype=torch.float64)
a5 = torch.tensor([1, 2], dtype=torch.float32)
a6 = torch.tensor([1, 2], dtype=torch.float16)

print("dtype of as: ", a1.dtype, a2.dtype, a3.dtype, a4.dtype, a5.dtype, a6.dtype, sep="\t")

b1, b2, b3, b4, b5, b6 = to_half([a1, a2, a3, a4, a5, a6])

print("dtype of bs: ", b1.dtype, b2.dtype, b3.dtype, b4.dtype, b5.dtype, b6.dtype, sep="\t")
dtype of as: 	torch.int64	torch.int32	torch.int16	torch.float64	torch.float32	torch.float16
dtype of bs: 	torch.int64	torch.int32	torch.int16	torch.float16	torch.float16	torch.float16



Convert a tensor to a numpy array.

Internally puts the data to CPU, and converts to numpy.ndarray equivalent of torch.tensor by calling torch.Tensor.numpy().

a = torch.tensor([1, 2], dtype=torch.float64)

if torch.cuda.is_available():
    a = a.cuda()

print(a, type(a), a.device)

b = to_np(a)

print(b, type(b))
tensor([1., 2.], dtype=torch.float64) <class 'torch.Tensor'> cpu
[1. 2.] <class 'numpy.ndarray'>


try_int(o:Any) → Any

Try to convert o to int, default to o if not possible.

# Converts floating point numbers to integer
print(try_int(12.5), type(try_int(12.5)))

# This is a Rank-1 ndarray, which ideally should not be converted to int 
print(try_int(np.array([1.5])), try_int(np.array([1.5])).dtype)

# Numpy array with a single elements are converted to int
print(try_int(np.array(1.5)), type(try_int(np.array(1.5))))

print(try_int(torch.tensor(2.5)), type(try_int(torch.tensor(2.5))))

# Strings are not converted to int (of course)
print(try_int("12.5"), type(try_int("12.5")))
12 <class 'int'>
[1.5] float64
1 <class 'int'>
2 <class 'int'>
12.5 <class 'str'>

Functions to deal with model initialization


apply_init(m, init_func:LayerFunc)

Initialize all non-batchnorm layers of m with init_func.


apply_leaf(m:Module, f:LayerFunc)

Apply f to children of m.


cond_init(m:Module, init_func:LayerFunc)

Initialize the non-batchnorm layers of m with init_func.


in_channels(m:Module) → List[int]

Return the shape of the first weight layer in m.


init_default(m:Module, func:LayerFunc='kaiming_normal_')

Initialize m weights with func and set bias to 0.

Functions to get information of a model


children(m:Module) → ModuleList

Get children of m.



Return the children of m and its direct parameters not registered in modules.


first_layer(m:Module) → Module

Retrieve first layer in a module m.


last_layer(m:Module) → Module

Retrieve last layer in a module m.


num_children(m:Module) → int

Get number of children modules in m.


one_param(m:Module) → Tensor

Return the first parameter of m.


range_children(m:Module) → Iterator[int]

Return iterator of len of children of m.


trainable_params(m:Module) → ParamList

Return list of trainable params in m.

Functions to deal with BatchNorm layers


bn2float(module:Module) → Module

If module is batchnorm don't use half precision.



Set bn layers in eval mode for all recursive children of m.


split_no_wd_params(layer_groups:ModuleList) → List[List[Parameter]]

Separate the parameters in layer_groups between batchnorm (bn_types) and bias (bias_types) from the rest.

This is used by the optimizer to determine which params should be applied weight decay when using the option bn_wd=False is used in a Learner.

Functions to get random tensors


log_uniform(low, high, size:Optional[List[int]]=None) → FloatOrTensor

Draw 1 or shape=size random floats from uniform dist: min=log(low), max=log(high).

tensor([0.5775, 0.7902, 0.6087, 0.5730, 0.8057, 0.8845, 0.8975, 0.5585])


rand_bool(p:float, size:Optional[List[int]]=None) → BoolOrTensor

Draw 1 or shape=size random booleans (True occuring with probability p).

rand_bool(0.5, 8)
tensor([1, 1, 0, 1, 0, 0, 1, 0], dtype=torch.uint8)


uniform(low:Number, high:Number=None, size:Optional[List[int]]=None) → FloatOrTensor

Draw 1 or shape=size random floats from uniform dist: min=low, max=high.

tensor([0.6432, 0.3110, 0.7588, 0.7058, 0.7121, 0.8552, 0.3352, 0.2620])


uniform_int(low:int, high:int, size:Optional[List[int]]=None) → IntOrTensor

Generate int or tensor size of ints between low and high (included).

tensor([0, 1, 1, 2, 1, 1, 1, 2])

Other functions

class ParameterModule[source]

ParameterModule(p:Parameter) :: Module

Register a lone parameter p in a module.


calc_loss(y_pred:Tensor, y_true:Tensor, loss_func:LossFunction)

Calculate loss between y_pred and y_true using loss_func.


data_collate(batch:ItemsList) → Tensor

Convert batch items to tensor data.



Return the model maybe wrapped inside model.


grab_idx(x, i, batch_first:bool=True)

Grab the i-th batch in x, batch_first stating the batch dimension.


logit(x:Tensor) → Tensor

Logit of x, clamped to avoid inf.


logit_(x:Tensor) → Tensor

Inplace logit of x, clamped to avoid inf



Return the torch type corresponding to dtype.


np_address(x:ndarray) → int

Address of x in memory.


split_model(model:Module, splits:Collection[Union[Module, ModuleList]], want_idxs:bool=False)

Split model according to the layers in splits.

If splits are layers, the model is split at those (not included) sequentially. If want_idxs is True, the corresponding indexes are returned. If splits are lists of layers, the model is split according to those.


split_model_idx(model:Module, idxs:Collection[int]) → ModuleList

Split model according to the indexes in idxs.



Create a tensor from range_of(x).