#For example, so not exported
from fastai.vision.core import *
from fastai.vision.data import *
Data block
DataLoaders
TransformBlock
TransformBlock (type_tfms:list=None, item_tfms:list=None, batch_tfms:list=None, dl_type:fastai.data.core.TfmdDL=None, dls_kwargs:dict=None)
A basic wrapper that links defaults transforms for the data block API
Type | Default | Details | |
---|---|---|---|
type_tfms | list | None | One or more Transform s |
item_tfms | list | None | ItemTransform s, applied on an item |
batch_tfms | list | None | Transform s or RandTransform s, applied by batch |
dl_type | TfmdDL | None | Task specific TfmdDL , defaults to TfmdDL |
dls_kwargs | dict | None | Additional arguments to be passed to DataLoaders |
CategoryBlock
CategoryBlock (vocab:collections.abc.MutableSequence|pandas.core.series. Series=None, sort:bool=True, add_na:bool=False)
TransformBlock
for single-label categorical targets
Type | Default | Details | |
---|---|---|---|
vocab | collections.abc.MutableSequence | pandas.core.series.Series | None | List of unique class names |
sort | bool | True | Sort the classes alphabetically |
add_na | bool | False | Add #na# to vocab |
MultiCategoryBlock
MultiCategoryBlock (encoded:bool=False, vocab:collections.abc.MutableSequence|pandas.core.ser ies.Series=None, add_na:bool=False)
TransformBlock
for multi-label categorical targets
Type | Default | Details | |
---|---|---|---|
encoded | bool | False | Whether the data comes in one-hot encoded |
vocab | collections.abc.MutableSequence | pandas.core.series.Series | None | List of unique class names |
add_na | bool | False | Add #na# to vocab |
RegressionBlock
RegressionBlock (n_out:int=None)
TransformBlock
for float targets
Type | Default | Details | |
---|---|---|---|
n_out | int | None | Number of output values |
General API
DataBlock
DataBlock (blocks:list=None, dl_type:TfmdDL=None, getters:list=None, n_inp:int=None, item_tfms:list=None, batch_tfms:list=None, get_items=None, splitter=None, get_y=None, get_x=None)
Generic container to quickly build Datasets
and DataLoaders
.
Type | Default | Details | |
---|---|---|---|
blocks | list | None | One or more TransformBlock s |
dl_type | TfmdDL | None | Task specific TfmdDL , defaults to block ’s dl_type orTfmdDL |
getters | list | None | Getter functions applied to results of get_items |
n_inp | int | None | Number of inputs |
item_tfms | list | None | ItemTransform s, applied on an item |
batch_tfms | list | None | Transform s or RandTransform s, applied by batch |
get_items | NoneType | None | |
splitter | NoneType | None | |
get_y | NoneType | None | |
get_x | NoneType | None |
To build a DataBlock
you need to give the library four things: the types of your input/labels, and at least two functions: get_items
and splitter
. You may also need to include get_x
and get_y
or a more generic list of getters
that are applied to the results of get_items
.
splitter is a callable which, when called with items
, returns a tuple of iterables representing the indices of the training and validation data.
Once those are provided, you automatically get a Datasets
or a DataLoaders
:
DataBlock.datasets
DataBlock.datasets (source, verbose:bool=False)
Create a Datasets
object from source
Type | Default | Details | |
---|---|---|---|
source | The data source | ||
verbose | bool | False | Show verbose messages |
Returns | Datasets |
DataBlock.dataloaders
DataBlock.dataloaders (source, path:str='.', verbose:bool=False, bs:int=64, shuffle:bool=False, num_workers:int=None, do_setup:bool=True, pin_memory=False, timeout=0, batch_size=None, drop_last=False, indexed=None, n=None, device=None, persistent_workers=False, pin_memory_device='', wif=None, before_iter=None, after_item=None, before_batch=None, after_batch=None, after_iter=None, create_batches=None, create_item=None, create_batch=None, retain=None, get_idxs=None, sample=None, shuffle_fn=None, do_batch=None)
Create a DataLoaders
object from source
Type | Default | Details | |
---|---|---|---|
source | The data source | ||
path | str | . | Data source and default Learner path |
verbose | bool | False | Show verbose messages |
bs | int | 64 | Size of batch |
shuffle | bool | False | Whether to shuffle data |
num_workers | int | None | Number of CPU cores to use in parallel (default: All available up to 16) |
do_setup | bool | True | Whether to run setup() for batch transform(s) |
pin_memory | bool | False | |
timeout | int | 0 | |
batch_size | NoneType | None | |
drop_last | bool | False | |
indexed | NoneType | None | |
n | NoneType | None | |
device | NoneType | None | |
persistent_workers | bool | False | |
pin_memory_device | str | ||
wif | NoneType | None | |
before_iter | NoneType | None | |
after_item | NoneType | None | |
before_batch | NoneType | None | |
after_batch | NoneType | None | |
after_iter | NoneType | None | |
create_batches | NoneType | None | |
create_item | NoneType | None | |
create_batch | NoneType | None | |
retain | NoneType | None | |
get_idxs | NoneType | None | |
sample | NoneType | None | |
shuffle_fn | NoneType | None | |
do_batch | NoneType | None | |
Returns | DataLoaders |
You can create a DataBlock
by passing functions:
= DataBlock(blocks = (ImageBlock(cls=PILImageBW),CategoryBlock),
mnist = get_image_files,
get_items = GrandparentSplitter(),
splitter = parent_label) get_y
Each type comes with default transforms that will be applied:
- at the base level to create items in a tuple (usually input,target) from the base elements (like filenames)
- at the item level of the datasets
- at the batch level
They are called respectively type transforms, item transforms, batch transforms. In the case of MNIST, the type transforms are the method to create a PILImageBW
(for the input) and the Categorize
transform (for the target), the item transform is ToTensor
and the batch transforms are Cuda
and IntToFloatTensor
. You can add any other transforms by passing them in DataBlock.datasets
or DataBlock.dataloaders
.
0], [PILImageBW.create])
test_eq(mnist.type_tfms[1].map(type), [Categorize])
test_eq(mnist.type_tfms[map(type), [ToTensor])
test_eq(mnist.default_item_tfms.map(type), [IntToFloatTensor]) test_eq(mnist.default_batch_tfms.
= mnist.datasets(untar_data(URLs.MNIST_TINY))
dsets '3', '7'])
test_eq(dsets.vocab, [= dsets.train[0]
x,y 28,28))
test_eq(x.size,(0, cmap='Greys', figsize=(2,2)); show_at(dsets.train,
lambda: DataBlock(wrong_kwarg=42, wrong_kwarg2='foo')) test_fail(
We can pass any number of blocks to DataBlock
, we can then define what are the input and target blocks by changing n_inp
. For example, defining n_inp=2
will consider the first two blocks passed as inputs and the others as targets.
= DataBlock((ImageBlock, ImageBlock, CategoryBlock), get_items=get_image_files, splitter=GrandparentSplitter(),
mnist =parent_label)
get_y= mnist.datasets(untar_data(URLs.MNIST_TINY))
dsets 2)
test_eq(mnist.n_inp, len(dsets.train[0]), 3) test_eq(
lambda: DataBlock((ImageBlock, ImageBlock, CategoryBlock), get_items=get_image_files, splitter=GrandparentSplitter(),
test_fail(=[parent_label, noop],
get_y=2), msg='get_y contains 2 functions, but must contain 1 (one for each output)') n_inp
= DataBlock((ImageBlock, ImageBlock, CategoryBlock), get_items=get_image_files, splitter=GrandparentSplitter(),
mnist =1,
n_inp=[noop, Pipeline([noop, parent_label])])
get_y= mnist.datasets(untar_data(URLs.MNIST_TINY))
dsets len(dsets.train[0]), 3) test_eq(
Debugging
DataBlock.summary
DataBlock.summary (source, bs:int=4, show_batch:bool=False, **kwargs)
Steps through the transform pipeline for one batch, and optionally calls show_batch(**kwargs)
on the transient Dataloaders
.
Type | Default | Details | |
---|---|---|---|
source | The data source | ||
bs | int | 4 | The batch size |
show_batch | bool | False | Call show_batch after the summary |
kwargs |
DataBlock.summary
DataBlock.summary (source, bs:int=4, show_batch:bool=False, **kwargs)
Steps through the transform pipeline for one batch, and optionally calls show_batch(**kwargs)
on the transient Dataloaders
.
Type | Default | Details | |
---|---|---|---|
source | The data source | ||
bs | int | 4 | The batch size |
show_batch | bool | False | Call show_batch after the summary |
kwargs |
Besides stepping through the transformation, summary()
provides a shortcut dls.show_batch(...)
, to see the data. E.g.
pets.summary(path/"images", bs=8, show_batch=True, unique=True,...)
is a shortcut to:
pets.summary(path/"images", bs=8)
dls = pets.dataloaders(path/"images", bs=8)
dls.show_batch(unique=True,...) # See different tfms effect on the same image.