from fastai.vision.all import *
This tutorial highlights on how to quickly build a
Learner and fine tune a pretrained model on most computer vision tasks.
For this task, we will use the Oxford-IIIT Pet Dataset that contains images of cats and dogs of 37 different breeds. We will first show how to build a simple cat-vs-dog classifier, then a little bit more advanced model that can classify all breeds.
The dataset can be downloaded and decompressed with this line of code:
path = untar_data(URLs.PETS)
It will only do this download once, and return the location of the decompressed archive. We can check what is inside with the
We will ignore the annotations folder for now, and focus on the images one.
get_image_files is a fastai function that helps us grab all the image files (recursively) in one folder.
files = get_image_files(path/"images") len(files)
To label our data for the cats vs dogs problem, we need to know which filenames are of dog pictures and which ones are of cat pictures. There is an easy way to distinguish: the name of the file begins with a capital for cats, and a lowercased letter for dogs:
We can then define an easy label function:
def label_func(f): return f.isupper()
To get our data ready for a model, we need to put it in a
DataLoaders object. Here we have a function that labels using the file names, so we will use
ImageDataLoaders.from_name_func. There are other factory methods of
ImageDataLoaders that could be more suitable for your problem, so make sure to check them all in
dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(224))
We have passed to this function the directory we're working in, the
files we grabbed, our
label_func and one last piece as
item_tfms: this is a
Transform applied on all items of our dataset that will resize each image to 224 by 224, by using a random crop on the largest dimension to make it a square, then resizing to 224 by 224. If we didn't pass this, we would get an error later as it would be impossible to batch the items together.
We can then check if everything looks okay with the
show_batch method (
True is for cat,
False is for dog):
Then we can create a
Learner, which is a fastai object that combines the data and a model for training, and uses transfer learning to fine tune a pretrained model in just two lines of code:
learn = vision_learner(dls, resnet34, metrics=error_rate) learn.fine_tune(1)
The first line downloaded a model called ResNet34, pretrained on ImageNet, and adapted it to our specific problem. It then fine tuned that model and in a relatively short time, we get a model with an error rate of well under 1%... amazing!
If you want to make a prediction on a new image, you can use
('False', TensorImage(0), TensorImage([9.9998e-01, 2.0999e-05]))
The predict method returns three things: the decoded prediction (here
False for dog), the index of the predicted class and the tensor of probabilities of all classes in the order of their indexed labels(in this case, the model is quite confifent about the being that of a dog). This method accepts a filename, a PIL image or a tensor directly in this case.
We can also have a look at some predictions with the
Check out the other applications like text or tabular, or the other problems covered in this tutorial, and you will see they all share a consistent API for gathering the data and look at it, create a
Learner, train the model and look at some predictions.
To label our data with the breed name, we will use a regular expression to extract it from the filename. Looking back at a filename, we have:
so the class is everything before the last
_ followed by some digits. A regular expression that will catch the name is thus:
pat = r'^(.*)_\d+.jpg'
Since it's pretty common to use regular expressions to label the data (often, labels are hidden in the file names), there is a factory method to do just that:
dls = ImageDataLoaders.from_name_re(path, files, pat, item_tfms=Resize(224))
Like before, we can then use
show_batch to have a look at our data: