= [(4,2), (17,8)]
emb_szs = TabularModel(emb_szs, n_cont=2, out_sz=2, layers=[200,100]).eval()
m = torch.tensor([[2,12]]).long()
x_cat = torch.tensor([[0.7633, -0.1887]]).float()
x_cont = m(x_cat, x_cont) out
Tabular model
Embeddings
emb_sz_rule
emb_sz_rule (n_cat:int)
Rule of thumb to pick embedding size corresponding to n_cat
Type | Details | |
---|---|---|
n_cat | int | Cardinality of a category |
Returns | int |
Through trial and error, this general rule takes the lower of two values:
- A dimension space of 600
- A dimension space equal to 1.6 times the cardinality of the variable to 0.56.
This provides a good starter for a good embedding space for your variables. For more advanced users who wish to lean into this practice, you can tweak these values to your discretion. It is not uncommon for slight adjustments to this general formula to provide more success.
get_emb_sz
get_emb_sz (to:fastai.tabular.core.Tabular|fastai.tabular.core.TabularPa ndas, sz_dict:dict=None)
Get embedding size for each cat_name in Tabular
or TabularPandas
, or populate embedding size manually using sz_dict
Type | Default | Details | |
---|---|---|---|
to | fastai.tabular.core.Tabular | fastai.tabular.core.TabularPandas | ||
sz_dict | dict | None | Dictionary of {‘class_name’ : size, …} to override default emb_sz_rule |
Returns | list | List of embedding sizes for each category |
TabularModel
TabularModel (emb_szs:list, n_cont:int, out_sz:int, layers:list, ps:float|MutableSequence=None, embed_p:float=0.0, y_range=None, use_bn:bool=True, bn_final:bool=False, bn_cont:bool=True, act_cls=ReLU(inplace=True), lin_first:bool=True)
Basic model for tabular data.
Type | Default | Details | |
---|---|---|---|
emb_szs | list | Sequence of (num_embeddings, embedding_dim) for each categorical variable | |
n_cont | int | Number of continuous variables | |
out_sz | int | Number of outputs for final LinBnDrop layer |
|
layers | list | Sequence of ints used to specify the input and output size of each LinBnDrop layer |
|
ps | float | collections.abc.MutableSequence | None | Sequence of dropout probabilities for LinBnDrop |
embed_p | float | 0.0 | Dropout probability for Embedding layer |
y_range | NoneType | None | Low and high for SigmoidRange activation |
use_bn | bool | True | Use BatchNorm1d in LinBnDrop layers |
bn_final | bool | False | Use BatchNorm1d on final layer |
bn_cont | bool | True | Use BatchNorm1d on continuous variables |
act_cls | ReLU | ReLU(inplace=True) | Activation type for LinBnDrop layers |
lin_first | bool | True | Linear layer is first or last in LinBnDrop layers |
This model expects your cat
and cont
variables seperated. cat
is passed through an Embedding
layer and potential Dropout
, while cont
is passed though potential BatchNorm1d
. Afterwards both are concatenated and passed through a series of LinBnDrop
, before a final Linear
layer corresponding to the expected outputs.
tabular_config
tabular_config (ps:float|MutableSequence=None, embed_p:float=0.0, y_range=None, use_bn:bool=True, bn_final:bool=False, bn_cont:bool=True, act_cls=ReLU(inplace=True), lin_first:bool=True)
Convenience function to easily create a config for TabularModel
Type | Default | Details | |
---|---|---|---|
ps | float | collections.abc.MutableSequence | None | Sequence of dropout probabilities for LinBnDrop |
embed_p | float | 0.0 | Dropout probability for Embedding layer |
y_range | NoneType | None | Low and high for SigmoidRange activation |
use_bn | bool | True | Use BatchNorm1d in LinBnDrop layers |
bn_final | bool | False | Use BatchNorm1d on final layer |
bn_cont | bool | True | Use BatchNorm1d on continuous variables |
act_cls | ReLU | ReLU(inplace=True) | Activation type for LinBnDrop layers |
lin_first | bool | True | Linear layer is first or last in LinBnDrop layers |
Any direct setup of TabularModel
’s internals should be passed through here:
= tabular_config(embed_p=0.6, use_bn=False); config config