Tabular model

A basic model that can be used on tabular data

Embeddings


source

emb_sz_rule

 emb_sz_rule (n_cat:int)

Rule of thumb to pick embedding size corresponding to n_cat

Type Details
n_cat int Cardinality of a category
Returns int

Through trial and error, this general rule takes the lower of two values:

  • A dimension space of 600
  • A dimension space equal to 1.6 times the cardinality of the variable to 0.56.

This provides a good starter for a good embedding space for your variables. For more advanced users who wish to lean into this practice, you can tweak these values to your discretion. It is not uncommon for slight adjustments to this general formula to provide more success.


source

get_emb_sz

 get_emb_sz
             (to:fastai.tabular.core.Tabular|fastai.tabular.core.TabularPa
             ndas, sz_dict:dict=None)

Get embedding size for each cat_name in Tabular or TabularPandas, or populate embedding size manually using sz_dict

Type Default Details
to Tabular | TabularPandas
sz_dict dict None Dictionary of {‘class_name’ : size, …} to override default emb_sz_rule
Returns list List of embedding sizes for each category

source

TabularModel

 TabularModel (emb_szs:list, n_cont:int, out_sz:int, layers:list,
               ps:float|MutableSequence=None, embed_p:float=0.0,
               y_range=None, use_bn:bool=True, bn_final:bool=False,
               bn_cont:bool=True, act_cls=ReLU(inplace=True),
               lin_first:bool=True)

Basic model for tabular data.

Type Default Details
emb_szs list Sequence of (num_embeddings, embedding_dim) for each categorical variable
n_cont int Number of continuous variables
out_sz int Number of outputs for final LinBnDrop layer
layers list Sequence of ints used to specify the input and output size of each LinBnDrop layer
ps float | MutableSequence None Sequence of dropout probabilities for LinBnDrop
embed_p float 0.0 Dropout probability for Embedding layer
y_range NoneType None Low and high for SigmoidRange activation
use_bn bool True Use BatchNorm1d in LinBnDrop layers
bn_final bool False Use BatchNorm1d on final layer
bn_cont bool True Use BatchNorm1d on continuous variables
act_cls ReLU ReLU(inplace=True) Activation type for LinBnDrop layers
lin_first bool True Linear layer is first or last in LinBnDrop layers

This model expects your cat and cont variables seperated. cat is passed through an Embedding layer and potential Dropout, while cont is passed though potential BatchNorm1d. Afterwards both are concatenated and passed through a series of LinBnDrop, before a final Linear layer corresponding to the expected outputs.

emb_szs = [(4,2), (17,8)]
m = TabularModel(emb_szs, n_cont=2, out_sz=2, layers=[200,100]).eval()
x_cat = torch.tensor([[2,12]]).long()
x_cont = torch.tensor([[0.7633, -0.1887]]).float()
out = m(x_cat, x_cont)

source

tabular_config

 tabular_config (ps:float|MutableSequence=None, embed_p:float=0.0,
                 y_range=None, use_bn:bool=True, bn_final:bool=False,
                 bn_cont:bool=True, act_cls=ReLU(inplace=True),
                 lin_first:bool=True)

Convenience function to easily create a config for TabularModel

Type Default Details
ps float | MutableSequence None Sequence of dropout probabilities for LinBnDrop
embed_p float 0.0 Dropout probability for Embedding layer
y_range NoneType None Low and high for SigmoidRange activation
use_bn bool True Use BatchNorm1d in LinBnDrop layers
bn_final bool False Use BatchNorm1d on final layer
bn_cont bool True Use BatchNorm1d on continuous variables
act_cls ReLU ReLU(inplace=True) Activation type for LinBnDrop layers
lin_first bool True Linear layer is first or last in LinBnDrop layers

Any direct setup of TabularModel’s internals should be passed through here:

config = tabular_config(embed_p=0.6, use_bn=False); config