= fastai_cfg()
cfg 'data') cfg.data,cfg.path(
('data', Path('/home/jhoward/.fastai/data'))
To download any of the datasets or pretrained weights, simply run untar_data
by passing any dataset name mentioned above like so:
path = untar_data(URLs.PETS)
path.ls()
>> (#7393) [Path('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/keeshond_34.jpg'),...]
To download model pretrained weights:
path = untar_data(URLs.WT103_BWD)
path.ls()
>> (#2) [Path('/home/ubuntu/.fastai/data/wt103-bwd/itos_wt103.pkl'),Path('/home/ubuntu/.fastai/data/wt103-bwd/lstm_bwd.pth')]
A complete list of datasets that are available by default inside the library are:
HUMAN_NUMBERS: A synthetic dataset consisting of human number counts in text such as one, two, three, four.. Useful for experimenting with Language Models.
IMDB: The full IMDB sentiment analysis dataset.
IMDB_SAMPLE: A sample of the full IMDB sentiment analysis dataset.
ML_SAMPLE: A movielens sample dataset for recommendation engines to recommend movies to users.
ML_100k: The movielens 100k dataset for recommendation engines to recommend movies to users.
MNIST_SAMPLE: A sample of the famous MNIST dataset consisting of handwritten digits.
MNIST_TINY: A tiny version of the famous MNIST dataset consisting of handwritten digits.
MNIST_VAR_SIZE_TINY:
PLANET_SAMPLE: A sample of the planets dataset from the Kaggle competition Planet: Understanding the Amazon from Space.
PLANET_TINY: A tiny version of the planets dataset from the Kaggle competition Planet: Understanding the Amazon from Space for faster experimentation and prototyping.
IMAGENETTE: A smaller version of the imagenet dataset pronounced just like ‘Imagenet’, except with a corny inauthentic French accent.
IMAGENETTE_160: The 160px version of the Imagenette dataset.
IMAGENETTE_320: The 320px version of the Imagenette dataset.
IMAGEWOOF: Imagewoof is a subset of 10 classes from Imagenet that aren’t so easy to classify, since they’re all dog breeds.
IMAGEWOOF_160: 160px version of the ImageWoof dataset.
IMAGEWOOF_320: 320px version of the ImageWoof dataset.
IMAGEWANG: Imagewang contains Imagenette and Imagewoof combined, but with some twists that make it into a tricky semi-supervised unbalanced classification problem
IMAGEWANG_160: 160px version of Imagewang.
IMAGEWANG_320: 320px version of Imagewang.
SIIM_SMALL: A smaller version of the SIIM dataset where the objective is to classify pneumothorax from a set of chest radiographic images.
TCGA_SMALL: A smaller version of the TCGA-OV dataset with subcutaneous and visceral fat segmentations. Citations:
Holback, C., Jarosz, R., Prior, F., Mutch, D. G., Bhosale, P., Garcia, K., … Erickson, B. J. (2016). Radiology Data from The Cancer Genome Atlas Ovarian Cancer [TCGA-OV] collection. The Cancer Imaging Archive. paper
Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. paper
fastai_cfg ()
Config
object for fastai’s config.ini
This is a basic Config
file that consists of data
, model
, storage
and archive
. All future downloads occur at the paths defined in the config file based on the type of download. For example, all future fastai datasets are downloaded to the data
while all pretrained model weights are download to model
unless the default download location is updated. The config file directory is defined by enviromental variable FASTAI_HOME
if it exists, otherwise it is set to ~/.fastai
.
fastai_path (folder:str)
Local path to folder
in Config
URLs ()
Global constants for dataset and model URLs.
The default local path is at ~/.fastai/archive/
but this can be updated by passing a different c_key
. Note: c_key
should be one of 'archive', 'data', 'model', 'storage'
.
url = URLs.PETS
local_path = URLs.path(url)
test_eq(local_path.parent, fastai_path('archive'))
local_path
Path('/home/jhoward/.fastai/archive/oxford-iiit-pet.tgz')
local_path = URLs.path(url, c_key='model')
test_eq(local_path.parent, fastai_path('model'))
local_path
Path('/home/jhoward/.fastai/models/oxford-iiit-pet.tgz')
untar_data (url:str, archive:pathlib.Path=None, data:pathlib.Path=None, c_key:str='data', force_download:bool=False, base:str='~/.fastai')
Download url
using FastDownload.get
Type | Default | Details | |
---|---|---|---|
url | str | File to download | |
archive | Path | None | Optional override for Config ’s archive key |
data | Path | None | Optional override for Config ’s data key |
c_key | str | data | Key in Config where to extract file |
force_download | bool | False | Setting to True will overwrite any existing copy of data |
base | str | ~/.fastai | Directory containing config file and base of relative paths |
Returns | Path | Path to extracted file(s) |
untar_data
is a thin wrapper for FastDownload.get
. It downloads and extracts url
, by default to subdirectories of ~/.fastai
(see fastai_cfg
for details), and returns the path to the extracted data. Setting the force_download
flag to ‘True’ will overwrite any existing copy of the data already present. For an explanation of the c_key
parameter, see URLs
.