Tabular model

A basic model that can be used on tabular data

Embeddings

emb_sz_rule

 emb_sz_rule (n_cat:int)

Rule of thumb to pick embedding size corresponding to n_cat

	Type	Details
n_cat	int	Cardinality of a category
Returns	int

Through trial and error, this general rule takes the lower of two values:

A dimension space of 600
A dimension space equal to 1.6 times the cardinality of the variable to 0.56.

This provides a good starter for a good embedding space for your variables. For more advanced users who wish to lean into this practice, you can tweak these values to your discretion. It is not uncommon for slight adjustments to this general formula to provide more success.

source

get_emb_sz

 get_emb_sz
             (to:fastai.tabular.core.Tabular|fastai.tabular.core.TabularPa
             ndas, sz_dict:dict=None)

Get embedding size for each cat_name in Tabular or TabularPandas, or populate embedding size manually using sz_dict

	Type	Default	Details
to	fastai.tabular.core.Tabular \| fastai.tabular.core.TabularPandas
sz_dict	dict	None	Dictionary of {‘class_name’ : size, …} to override default `emb_sz_rule`
Returns	list		List of embedding sizes for each category

source

TabularModel

 TabularModel (emb_szs:list, n_cont:int, out_sz:int, layers:list,
               ps:float|MutableSequence=None, embed_p:float=0.0,
               y_range=None, use_bn:bool=True, bn_final:bool=False,
               bn_cont:bool=True, act_cls=ReLU(inplace=True),
               lin_first:bool=True)

Basic model for tabular data.

	Type	Default	Details
emb_szs	list		Sequence of (num_embeddings, embedding_dim) for each categorical variable
n_cont	int		Number of continuous variables
out_sz	int		Number of outputs for final `LinBnDrop` layer
layers	list		Sequence of ints used to specify the input and output size of each `LinBnDrop` layer
ps	float \| collections.abc.MutableSequence	None	Sequence of dropout probabilities for `LinBnDrop`
embed_p	float	0.0	Dropout probability for `Embedding` layer
y_range	NoneType	None	Low and high for `SigmoidRange` activation
use_bn	bool	True	Use `BatchNorm1d` in `LinBnDrop` layers
bn_final	bool	False	Use `BatchNorm1d` on final layer
bn_cont	bool	True	Use `BatchNorm1d` on continuous variables
act_cls	ReLU	ReLU(inplace=True)	Activation type for `LinBnDrop` layers
lin_first	bool	True	Linear layer is first or last in `LinBnDrop` layers

This model expects your cat and cont variables seperated. cat is passed through an Embedding layer and potential Dropout, while cont is passed though potential BatchNorm1d. Afterwards both are concatenated and passed through a series of LinBnDrop, before a final Linear layer corresponding to the expected outputs.

emb_szs = [(4,2), (17,8)]
m = TabularModel(emb_szs, n_cont=2, out_sz=2, layers=[200,100]).eval()
x_cat = torch.tensor([[2,12]]).long()
x_cont = torch.tensor([[0.7633, -0.1887]]).float()
out = m(x_cat, x_cont)

source

tabular_config

 tabular_config (ps:float|MutableSequence=None, embed_p:float=0.0,
                 y_range=None, use_bn:bool=True, bn_final:bool=False,
                 bn_cont:bool=True, act_cls=ReLU(inplace=True),
                 lin_first:bool=True)

Convenience function to easily create a config for TabularModel

	Type	Default	Details
ps	float \| collections.abc.MutableSequence	None	Sequence of dropout probabilities for `LinBnDrop`
embed_p	float	0.0	Dropout probability for `Embedding` layer
y_range	NoneType	None	Low and high for `SigmoidRange` activation
use_bn	bool	True	Use `BatchNorm1d` in `LinBnDrop` layers
bn_final	bool	False	Use `BatchNorm1d` on final layer
bn_cont	bool	True	Use `BatchNorm1d` on continuous variables
act_cls	ReLU	ReLU(inplace=True)	Activation type for `LinBnDrop` layers
lin_first	bool	True	Linear layer is first or last in `LinBnDrop` layers

Any direct setup of TabularModel’s internals should be passed through here:

config = tabular_config(embed_p=0.6, use_bn=False); config