Tabular model

A basic model that can be used on tabular data

Embeddings


source

emb_sz_rule


def emb_sz_rule(
    n_cat:int, # Cardinality of a category
)->int:

Rule of thumb to pick embedding size corresponding to n_cat

Through trial and error, this general rule takes the lower of two values:

  • A dimension space of 600
  • A dimension space equal to 1.6 times the cardinality of the variable to 0.56.

This provides a good starter for a good embedding space for your variables. For more advanced users who wish to lean into this practice, you can tweak these values to your discretion. It is not uncommon for slight adjustments to this general formula to provide more success.


source

get_emb_sz


def get_emb_sz(
    to:Tabular | TabularPandas,
    sz_dict:dict=None, # Dictionary of {'class_name' : size, ...} to override default [`emb_sz_rule`](https://docs.fast.ai/tabular.model.html#emb_sz_rule)
)->list: # List of embedding sizes for each category

Get embedding size for each cat_name in Tabular or TabularPandas, or populate embedding size manually using sz_dict


source

TabularModel


def TabularModel(
    emb_szs:list, # Sequence of (num_embeddings, embedding_dim) for each categorical variable
    n_cont:int, # Number of continuous variables
    out_sz:int, # Number of outputs for final [`LinBnDrop`](https://docs.fast.ai/layers.html#linbndrop) layer
    layers:list, # Sequence of ints used to specify the input and output size of each [`LinBnDrop`](https://docs.fast.ai/layers.html#linbndrop) layer
    ps:float | MutableSequence=None, # Sequence of dropout probabilities for [`LinBnDrop`](https://docs.fast.ai/layers.html#linbndrop)
    embed_p:float=0.0, # Dropout probability for [`Embedding`](https://docs.fast.ai/layers.html#embedding) layer
    y_range:NoneType=None, # Low and high for [`SigmoidRange`](https://docs.fast.ai/layers.html#sigmoidrange) activation
    use_bn:bool=True, # Use `BatchNorm1d` in [`LinBnDrop`](https://docs.fast.ai/layers.html#linbndrop) layers
    bn_final:bool=False, # Use `BatchNorm1d` on final layer
    bn_cont:bool=True, # Use `BatchNorm1d` on continuous variables
    act_cls:ReLU=ReLU(inplace=True), # Activation type for [`LinBnDrop`](https://docs.fast.ai/layers.html#linbndrop) layers
    lin_first:bool=True, # Linear layer is first or last in [`LinBnDrop`](https://docs.fast.ai/layers.html#linbndrop) layers
):

Basic model for tabular data.

This model expects your cat and cont variables seperated. cat is passed through an Embedding layer and potential Dropout, while cont is passed though potential BatchNorm1d. Afterwards both are concatenated and passed through a series of LinBnDrop, before a final Linear layer corresponding to the expected outputs.

emb_szs = [(4,2), (17,8)]
m = TabularModel(emb_szs, n_cont=2, out_sz=2, layers=[200,100]).eval()
x_cat = torch.tensor([[2,12]]).long()
x_cont = torch.tensor([[0.7633, -0.1887]]).float()
out = m(x_cat, x_cont)

source

tabular_config


def tabular_config(
    ps:float | MutableSequence=None, # Sequence of dropout probabilities for [`LinBnDrop`](https://docs.fast.ai/layers.html#linbndrop)
    embed_p:float=0.0, # Dropout probability for [`Embedding`](https://docs.fast.ai/layers.html#embedding) layer
    y_range:NoneType=None, # Low and high for [`SigmoidRange`](https://docs.fast.ai/layers.html#sigmoidrange) activation
    use_bn:bool=True, # Use `BatchNorm1d` in [`LinBnDrop`](https://docs.fast.ai/layers.html#linbndrop) layers
    bn_final:bool=False, # Use `BatchNorm1d` on final layer
    bn_cont:bool=True, # Use `BatchNorm1d` on continuous variables
    act_cls:ReLU=ReLU(inplace=True), # Activation type for [`LinBnDrop`](https://docs.fast.ai/layers.html#linbndrop) layers
    lin_first:bool=True, # Linear layer is first or last in [`LinBnDrop`](https://docs.fast.ai/layers.html#linbndrop) layers
):

Convenience function to easily create a config for TabularModel

Any direct setup of TabularModel’s internals should be passed through here:

config = tabular_config(embed_p=0.6, use_bn=False); config