fastai offers several widgets to support the workflow of a deep learning practitioner. The purpose of the widgets are to help you organize, clean, and prepare your data for your model. Widgets are separated by data type.
path = untar_data(URLs.MNIST_SAMPLE) data = ImageDataBunch.from_folder(path)
learn = cnn_learner(data, models.resnet18, metrics=error_rate)
We create a databunch with all the data in the training set and no validation set (DatasetFormatter uses only the training set)
db = (ImageList.from_folder(path) .split_none() .label_from_folder() .databunch())
learn = cnn_learner(db, models.resnet18, metrics=[accuracy]) learn.load('stage-1');
Tests found for
pytest -sv tests/test_widgets_image_cleaner.py::test_image_cleaner_index_length_mismatch[source]
pytest -sv tests/test_widgets_image_cleaner.py::test_image_cleaner_length_correct[source]
pytest -sv tests/test_widgets_image_cleaner.py::test_image_cleaner_wrong_input_type[source]
To run tests please refer to this guide.
Displays images for relabeling or deletion and saves changes in
path as 'cleaned.csv'.
ds, idxs = DatasetFormatter().from_toplosses(learn)
ImageCleaner(ds, idxs, path)
<fastai.widgets.image_cleaner.ImageCleaner at 0x7f9da3659b00>
ImageCleaner does not change anything on disk (neither labels or existence of images). Instead, it creates a 'cleaned.csv' file in your data path from which you need to load your new databunch for the files to changes to be applied.
df = pd.read_csv(path/'cleaned.csv', header='infer')
# We create a databunch from our csv. We include the data in the training set and we don't use a validation set (DatasetFormatter uses only the training set) np.random.seed(42) db = (ImageList.from_df(df, path) .split_none() .label_from_df() .databunch(bs=64))
learn = cnn_learner(db, models.resnet18, metrics=error_rate) learn = learn.load('stage-1')
You can then use
ImageCleaner again to find duplicates in the dataset. To do this, you can specify
duplicates=True while calling ImageCleaner after getting the indices and dataset from
.from_similars. Note that if you are using a layer's output which has dimensions
(n_batches, n_features, 1, 1) then you don't need any pooling (this is the case with the last layer). The suggested use of
.from_similars() with resnets is using the last layer and no pooling, like in the following cell.
ds, idxs = DatasetFormatter().from_similars(learn, layer_ls=[0,7,1], pool=None)
ImageCleaner(ds, idxs, path, duplicates=True)
<fastai.widgets.image_cleaner.ImageCleaner at 0x7f9d3dfd53c8>
Displays a widget that allows searching and downloading images from google images search in a Jupyter Notebook or Lab.
ImageDownloader widget gives you a way to quickly bootstrap your image dataset without leaving the notebook. It searches and downloads images that match the search criteria and resolution / quality requirements and stores them on your filesystem within the provided
Images for each search query (or label) are stored in a separate folder within
path. For example, if you pupulate
tiger with a
path setup to
./data, you'll get a folder
./data/tiger/ with the tiger images in it.
path = Config.data_path()/'image_downloader' os.makedirs(path, exist_ok=True) ImageDownloader(path)
<fastai.widgets.image_downloader.ImageDownloader at 0x7f9da36599b0>
path = Config.data_path()/'image_downloader' files = download_google_images(path, 'aussie shepherd', size='>1024*768', n_images=30) len(files)
n_images images on Google, matching
size requirements, download them into
search_term and verify them, using
# Setup path and labels to search for path = Config.data_path()/'image_downloader' labels = ['boston terrier', 'french bulldog'] # Download images for label in labels: download_google_images(path, label, size='>400*300', n_images=50) # Build a databunch and train! src = (ImageList.from_folder(path) .split_by_rand_pct() .label_from_folder() .transform(get_transforms(), size=224)) db = src.databunch(bs=16, num_workers=0) learn = cnn_learner(db, models.resnet34, metrics=[accuracy]) learn.fit_one_cycle(3)
Downloading more than a hundred images¶
To fetch more than a hundred images,
chromedriver to scroll through the Google Images search results page and scrape image URLs. They're not required as dependencies by default. If you don't have them installed on your system, the widget will show you an error message.
pip install selenium in your fastai environment.
On a mac, you can install
brew cask install chromedriver.
On Ubuntu Take a look at the latest Chromedriver version available, then something like:
wget https://chromedriver.storage.googleapis.com/2.45/chromedriver_linux64.zip unzip chromedriver_linux64.zip
Note that downloading under 100 images doesn't require any dependencies other than fastai itself, however downloading more than a hundred images uses
size can be one of:
'>400*300' '>640*480' '>800*600' '>1024*768' '>2MP' '>4MP' '>6MP' '>8MP' '>10MP' '>12MP' '>15MP' '>20MP' '>40MP' '>70MP'