test_dl in a data object.
It also ensure all the dataloaders are on
device and apply to them
tfms as batch are drawn (like normalization).
path is used internally to store temporary files,
collate_fn is passed to the pytorch
Dataloader (replacing the one there) to explain how to collate the samples picked for a batch. By default, it applies data to the object sent (see in
vision.image or the data block API why this can be important).
valid_dl and optionally
test_dl will be wrapped in
valid_ds and maybe
test_ds with a batch size of
num_workers is the number of CPUs to use,
collate_fn are passed to the init method.
path/fname a saved
Adds a transform to all dataloaders.
If you want to use your pytorch
Dataset in fastai, you may need to implement more attributes/methods if you want to use the full functionality of the library. Some functions can easily be used with your pytorch
Dataset if you just add an attribute, for others, the best would be to create your own
ItemList by following this tutorial. Here is a full list of what the library will expect.
First of all, you obviously need to implement the methods
__getitem__, as indicated by the pytorch docs. Then the most needed things would be:
cattribute: it's used in most functions that directly create a
create_cnn) and represents the number of outputs of the final layer of your model (also the number of classes if applicable).
classesattribute: it's used by
ClassificationInterpretationand also in
collab_learner(best to use
CollabDataBunch.from_dfthan a pytorch
Dataset) and represents the unique tags that appear in your data.
- maybe a
loss_funcattribute: that is going to be used by
Learneras a default loss function, so if you know your custom
Datasetrequires a particular loss, you can put it.
In tabular, your dataset will need to have a
cont_names attribute (for the names of continuous variables) and a
get_emb_szs method that returns a list of tuple
(n_classes, emb_sz) representing, for each categorical variable, the number of different codes (don't forget to add 1 for nan) and the corresponding embedding size. Those two are used with the
c attribute by
Put the batches of
device after applying an optional list of
collate_fn will replace the one of
dl. All dataloaders of a
DataBunch are of this type.
collate_fn will be used to put the samples together in one batch (by default it grabs their data attribute).
shuffle means the dataloader will take the samples randomly if that flag is set to
True, or in the right order otherwise.
tfms are passed to the init method. All
kwargs are passed to the pytorch
DataLoader class initialization.
Enum= [Train, Valid, Test, Single, Fix]
Internal enumerator to name the training, validation and test dataset/dataloader.