The fastai library as a layered API as summarized by this graph:
If you are following this tutorial, you are probably already familiar with the applications, here we will see how they are powered by the high-level and mid-level API.
Imagenette is a subset of ImageNet with 10 very different classes. It's great to quickly experiment before trying a fleshed-out technique on the full ImageNet dataset. We will show in this tutorial how to train a model on it, using the usual high-level APIs, then delving inside the fastai library to show you how to use the mid-level APIs we designed. This way you'll be able to customize your own data collection or training as needed.
This is the most basic way of assembling the data that we have presented in all the beginner tutorials, so hopefully it should be familiar to you by now.
First, we import everything inside the vision application:
from fastai.vision.all import *
Matplotlib is building the font cache using fc-list. This may take a moment.
Then we download the dataset and decompress it (if needed) and get its location:
path = untar_data(URLs.IMAGENETTE_160)
ImageDataLoaders.from_folder to get everything (since our data is organized in an imageNet-style format):
dls = ImageDataLoaders.from_folder(path, valid='val', item_tfms=RandomResizedCrop(128, min_scale=0.35), batch_tfms=Normalize.from_stats(*imagenet_stats))
And we can have a look at our data:
And as we saw in previous tutorials, the
get_image_files function helps get all the images in subfolders:
fnames = get_image_files(path)
dblock = DataBlock()
By itself, a
DataBlock is just a blue print on how to assemble your data. It does not do anything until you pass it a source. You can choose to then convert that source into a
Datasets or a
DataLoaders by using the
DataBlock.dataloaders method. Since we haven't done anything to get our data ready for batches, the
dataloaders method will fail here, but we can have a look at how it gets converted in
Datasets. This is where we pass the source of our data, here all of our filenames:
dsets = dblock.datasets(fnames) dsets.train
By default, the data block API assumes we have an input and a target, which is why we see our filename repeated twice.
The first thing we can do is to use a
get_items function to actually assemble our items inside the data block:
dblock = DataBlock(get_items = get_image_files)
The difference is that you then pass as a source the folder with the images and not all the filenames:
dsets = dblock.datasets(path) dsets.train
Our inputs are ready to be processed as images (since images can be built from filenames), but our target is not. We need to convert that filename to a class name. For this, fastai provides
This is not very readable, so since we can actually make the function we want, let's convert those obscure labels to something we can read:
lbl_dict = dict( n01440764='tench', n02102040='English springer', n02979186='cassette player', n03000684='chain saw', n03028079='church', n03394916='French horn', n03417042='garbage truck', n03425413='gas pump', n03445777='golf ball', n03888257='parachute' )
def label_func(fname): return lbl_dict[parent_label(fname)]
We can then tell our data block to use it to label our target by passing it as
dblock = DataBlock(get_items = get_image_files, get_y = label_func) dsets = dblock.datasets(path) dsets.train
dblock = DataBlock(blocks = (ImageBlock, CategoryBlock), get_items = get_image_files, get_y = label_func) dsets = dblock.datasets(path) dsets.train
(PILImage mode=RGB size=240x160, TensorCategory(0))
We can see how the
DataBlock automatically added the transforms necessary to open the image, or how it changed the name "cat" to an index (with a special tensor type). To do this, it created a mapping from categories to index called "vocab" that we can access this way:
(#10) ['English springer','French horn','cassette player','chain saw','church','garbage truck','gas pump','golf ball','parachute','tench']
Note that you can mix and match any block for input and targets, which is why the API is named data block API. You can also have more than two blocks (if you have multiple inputs and/or targets), you would just need to pass
n_inp to the
DataBlock to tell the library how many inputs there are (the rest would be targets) and pass a list of functions to
get_y (to explain how to process each item to be ready for its type). See the object detection below for such an example.
The next step is to control how our validation set is created. We do this by passing a
DataBlock. For instance, here is how we split by grandparent folder.
dblock = DataBlock(blocks = (ImageBlock, CategoryBlock), get_items = get_image_files, get_y = label_func, splitter = GrandparentSplitter()) dsets = dblock.datasets(path) dsets.train
(PILImage mode=RGB size=160x357, TensorCategory(6))
The last step is to specify item transforms and batch transforms (the same way as we do it in
ImageDataLoaders factory methods):
dblock = DataBlock(blocks = (ImageBlock, CategoryBlock), get_items = get_image_files, get_y = label_func, splitter = GrandparentSplitter(), item_tfms = RandomResizedCrop(128, min_scale=0.35), batch_tfms=Normalize.from_stats(*imagenet_stats))
dls = dblock.dataloaders(path) dls.show_batch()
imagenette = DataBlock(blocks = (ImageBlock, CategoryBlock), get_items = get_image_files, get_y = Pipeline([parent_label, lbl_dict.__getitem__]), splitter = GrandparentSplitter(valid_name='val'), item_tfms = RandomResizedCrop(128, min_scale=0.35), batch_tfms = Normalize.from_stats(*imagenet_stats))
dls = imagenette.dataloaders(path) dls.show_batch()