path = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(path/'adult.csv')
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [Categorify, FillMissing, Normalize]
dls = TabularDataLoaders.from_df(df, path, procs=procs, cat_names=cat_names, cont_names=cont_names,
y_names="salary", valid_idx=list(range(800,1000)), bs=64)
learn = tabular_learner(dls)Tabular learner
Learner ready to train for tabular data
The main function you probably want to use in this module is tabular_learner. It will automatically create a TabularModel suitable for your data and infer the right loss function. See the tabular tutorial for an example of use in context.
Main functions
TabularLearner
def TabularLearner(
dls:DataLoaders, # [`DataLoaders`](https://docs.fast.ai/data.core.html#dataloaders) containing fastai or PyTorch [`DataLoader`](https://docs.fast.ai/data.load.html#dataloader)s
model:Callable, # PyTorch model for training or inference
loss_func:Callable | None=None, # Loss function. Defaults to `dls` loss
opt_func:Optimizer | OptimWrapper=Adam, # Optimization function for training
lr:float | slice=0.001, # Default learning rate
splitter:Callable=trainable_params, # Split model into parameter groups. Defaults to one parameter group
cbs:Callback | MutableSequence | None=None, # [`Callback`](https://docs.fast.ai/callback.core.html#callback)s to add to [`Learner`](https://docs.fast.ai/learner.html#learner)
metrics:Callable | MutableSequence | None=None, # [`Metric`](https://docs.fast.ai/learner.html#metric)s to calculate on validation set
path:str | Path | None=None, # Parent directory to save, load, and export models. Defaults to `dls` `path`
model_dir:str | Path='models', # Subdirectory to save and load models
wd:float | int | None=None, # Default weight decay
wd_bn_bias:bool=False, # Apply weight decay to normalization and bias parameters
train_bn:bool=True, # Train frozen normalization layers
moms:tuple=(0.95, 0.85, 0.95), # Default momentum for schedulers
default_cbs:bool=True, # Include default [`Callback`](https://docs.fast.ai/callback.core.html#callback)s
):
Learner for tabular data
It works exactly as a normal Learner, the only difference is that it implements a predict method specific to work on a row of data.
tabular_learner
def tabular_learner(
dls:TabularDataLoaders, layers:list=None, # Size of the layers generated by [`LinBnDrop`](https://docs.fast.ai/layers.html#linbndrop)
emb_szs:list=None, # Tuples of `n_unique, embedding_size` for all categorical features
config:dict=None, # Config params for TabularModel from [`tabular_config`](https://docs.fast.ai/tabular.model.html#tabular_config)
n_out:int=None, # Final output size of the model
y_range:Tuple=None, # Low and high for the final sigmoid function
loss_func:Callable | None=None, # Loss function. Defaults to `dls` loss
opt_func:Optimizer | OptimWrapper=Adam, # Optimization function for training
lr:float | slice=0.001, # Default learning rate
splitter:Callable=trainable_params, # Split model into parameter groups. Defaults to one parameter group
cbs:Callback | MutableSequence | None=None, # [`Callback`](https://docs.fast.ai/callback.core.html#callback)s to add to [`Learner`](https://docs.fast.ai/learner.html#learner)
metrics:Callable | MutableSequence | None=None, # [`Metric`](https://docs.fast.ai/learner.html#metric)s to calculate on validation set
path:str | Path | None=None, # Parent directory to save, load, and export models. Defaults to `dls` `path`
model_dir:str | Path='models', # Subdirectory to save and load models
wd:float | int | None=None, # Default weight decay
wd_bn_bias:bool=False, # Apply weight decay to normalization and bias parameters
train_bn:bool=True, # Train frozen normalization layers
moms:tuple=(0.95, 0.85, 0.95), # Default momentum for schedulers
default_cbs:bool=True, # Include default [`Callback`](https://docs.fast.ai/callback.core.html#callback)s
):
Get a Learner using dls, with metrics, including a TabularModel created using the remaining params.
If your data was built with fastai, you probably won’t need to pass anything to emb_szs unless you want to change the default of the library (produced by get_emb_sz), same for n_out which should be automatically inferred. layers will default to [200,100] and is passed to TabularModel along with the config.
Use tabular_config to create a config and customize the model used. There is just easy access to y_range because this argument is often used.
All the other arguments are passed to Learner.
TabularLearner.predict
def predict(
row:pd.Series, # Features to be predicted
):
Predict on a single sample
We can pass in an individual row of data into our TabularLearner’s predict method. It’s output is slightly different from the other predict methods, as this one will always return the input as well:
row, clas, probs = learn.predict(df.iloc[0])row.show()| workclass | education | marital-status | occupation | relationship | race | education-num_na | age | fnlwgt | education-num | salary | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Private | Assoc-acdm | Married-civ-spouse | #na# | Wife | White | False | 49.0 | 101320.001685 | 12.0 | <50k |
clas, probs(tensor(0), tensor([0.5264, 0.4736]))