wgts = {'0.encoder.weight': torch.randn(5,3)}
new_wgts = match_embeds(wgts.copy(), ['a', 'b', 'c'], ['a', 'c', 'd', 'b'])
old,new = wgts['0.encoder.weight'],new_wgts['0.encoder.weight']
test_eq(new[0], old[0])
test_eq(new[1], old[2])
test_eq(new[2], old.mean(0))
test_eq(new[3], old[1])Text learner
Learner suitable for transfer learning in NLP
The most important functions of this module are language_model_learner and text_classifier_learner. They will help you define a Learner using a pretrained model. See the text tutorial for examples of use.
Loading a pretrained model
In text, to load a pretrained model, we need to adapt the embeddings of the vocabulary used for the pre-training to the vocabulary of our current corpus.
match_embeds
match_embeds (old_wgts:dict, old_vocab:list, new_vocab:list)
Convert the embedding in old_wgts to go from old_vocab to new_vocab.
| Type | Details | |
|---|---|---|
| old_wgts | dict | Embedding weights |
| old_vocab | list | Vocabulary of corpus used for pre-training |
| new_vocab | list | Current corpus vocabulary |
| Returns | dict |
For words in new_vocab that don’t have a corresponding match in old_vocab, we use the mean of all pretrained embeddings.
load_ignore_keys
load_ignore_keys (model, wgts:dict)
Load wgts in model ignoring the names of the keys, just taking parameters in order
| Type | Details | |
|---|---|---|
| model | Model architecture | |
| wgts | dict | Model weights |
| Returns | tuple |
clean_raw_keys
clean_raw_keys (wgts:dict)
load_model_text
load_model_text (file:str, model, opt:fastai.optimizer.Optimizer, with_opt:bool=None, device:int|str|torch.device=None, strict:bool=True, **kwargs)
Load model from file along with opt (if available, and if with_opt)
| Type | Default | Details | |
|---|---|---|---|
| file | str | File name of saved text model | |
| model | Model architecture | ||
| opt | Optimizer | Optimizer used to fit the model |
|
| with_opt | bool | None | Enable to load Optimizer state |
| device | int | str | torch.device | None | Sets the device, uses ‘cpu’ if unspecified |
| strict | bool | True | Whether to strictly enforce the keys of files state dict match with the model Module.state_dict |
| kwargs | VAR_KEYWORD |
TextLearner
TextLearner (dls:DataLoaders, model, alpha:float=2.0, beta:float=1.0, moms:tuple=(0.8, 0.7, 0.8), loss_func:Callable|None=None, opt_func:Optimizer|OptimWrapper=<function Adam>, lr:float|slice=0.001, splitter:Callable=<function trainable_params>, cbs:Callback|MutableSequence|None=None, metrics:Callable|MutableSequence|None=None, path:str|Path|None=None, model_dir:str|Path='models', wd:float|int|None=None, wd_bn_bias:bool=False, train_bn:bool=True, default_cbs:bool=True)
Basic class for a Learner in NLP.
| Type | Default | Details | |
|---|---|---|---|
| dls | DataLoaders | Text DataLoaders |
|
| model | A standard PyTorch model | ||
| alpha | float | 2.0 | Param for RNNRegularizer |
| beta | float | 1.0 | Param for RNNRegularizer |
| moms | tuple | (0.8, 0.7, 0.8) | Momentum for Cosine Annealing Scheduler |
| loss_func | Optional | None | Loss function. Defaults to dls loss |
| opt_func | fastai.optimizer.Optimizer | fastai.optimizer.OptimWrapper | Adam | Optimization function for training |
| lr | float | slice | 0.001 | Default learning rate |
| splitter | Callable | trainable_params | Split model into parameter groups. Defaults to one parameter group |
| cbs | fastai.callback.core.Callback | collections.abc.MutableSequence | None | None | Callbacks to add to Learner |
| metrics | Union | None | Metrics to calculate on validation set |
| path | str | pathlib.Path | None | None | Parent directory to save, load, and export models. Defaults to dls path |
| model_dir | str | pathlib.Path | models | Subdirectory to save and load models |
| wd | float | int | None | None | Default weight decay |
| wd_bn_bias | bool | False | Apply weight decay to normalization and bias parameters |
| train_bn | bool | True | Train frozen normalization layers |
| default_cbs | bool | True | Include default Callbacks |
rnn_cbs(2., 1.)[ModelResetter, RNNCallback, RNNRegularizer]
Adds a ModelResetter and an RNNRegularizer with alpha and beta to the callbacks, the rest is the same as Learner init.
This Learner adds functionality to the base class:
TextLearner.load_pretrained
TextLearner.load_pretrained (wgts_fname:str, vocab_fname:str, model=None)
Load a pretrained model and adapt it to the data vocabulary.
| Type | Default | Details | |
|---|---|---|---|
| wgts_fname | str | Filename of saved weights | |
| vocab_fname | str | Saved vocabulary filename in pickle format | |
| model | NoneType | None | Model to load parameters from, defaults to Learner.model |
wgts_fname should point to the weights of the pretrained model and vocab_fname to the vocabulary used to pretrain it.
TextLearner.save_encoder
TextLearner.save_encoder (file:str)
Save the encoder to file in the model directory
| Type | Details | |
|---|---|---|
| file | str | Filename for Encoder |
The model directory is Learner.path/Learner.model_dir.
TextLearner.load_encoder
TextLearner.load_encoder (file:str, device:int|str|torch.device=None)
Load the encoder file from the model directory, optionally ensuring it’s on device
| Type | Default | Details | |
|---|---|---|---|
| file | str | Filename of the saved encoder | |
| device | int | str | torch.device | None | Device used to load, defaults to dls device |
Language modeling predictions
For language modeling, the predict method is quite different from the other applications, which is why it needs its own subclass.
decode_spec_tokens
decode_spec_tokens (tokens)
Decode the special tokens in tokens
test_eq(decode_spec_tokens(['xxmaj', 'text']), ['Text'])
test_eq(decode_spec_tokens(['xxup', 'text']), ['TEXT'])
test_eq(decode_spec_tokens(['xxrep', '3', 'a']), ['aaa'])
test_eq(decode_spec_tokens(['xxwrep', '3', 'word']), ['word', 'word', 'word'])LMLearner
LMLearner (dls:DataLoaders, model, alpha:float=2.0, beta:float=1.0, moms:tuple=(0.8, 0.7, 0.8), loss_func:Callable|None=None, opt_func:Optimizer|OptimWrapper=<function Adam>, lr:float|slice=0.001, splitter:Callable=<function trainable_params>, cbs:Callback|MutableSequence|None=None, metrics:Callable|MutableSequence|None=None, path:str|Path|None=None, model_dir:str|Path='models', wd:float|int|None=None, wd_bn_bias:bool=False, train_bn:bool=True, default_cbs:bool=True)
Add functionality to TextLearner when dealing with a language model
| Type | Default | Details | |
|---|---|---|---|
| dls | DataLoaders | Text DataLoaders |
|
| model | A standard PyTorch model | ||
| alpha | float | 2.0 | Param for RNNRegularizer |
| beta | float | 1.0 | Param for RNNRegularizer |
| moms | tuple | (0.8, 0.7, 0.8) | Momentum for Cosine Annealing Scheduler |
| loss_func | Optional | None | Loss function. Defaults to dls loss |
| opt_func | fastai.optimizer.Optimizer | fastai.optimizer.OptimWrapper | Adam | Optimization function for training |
| lr | float | slice | 0.001 | Default learning rate |
| splitter | Callable | trainable_params | Split model into parameter groups. Defaults to one parameter group |
| cbs | fastai.callback.core.Callback | collections.abc.MutableSequence | None | None | Callbacks to add to Learner |
| metrics | Union | None | Metrics to calculate on validation set |
| path | str | pathlib.Path | None | None | Parent directory to save, load, and export models. Defaults to dls path |
| model_dir | str | pathlib.Path | models | Subdirectory to save and load models |
| wd | float | int | None | None | Default weight decay |
| wd_bn_bias | bool | False | Apply weight decay to normalization and bias parameters |
| train_bn | bool | True | Train frozen normalization layers |
| default_cbs | bool | True | Include default Callbacks |
LMLearner.predict
LMLearner.predict (text, n_words=1, no_unk=True, temperature=1.0, min_p=None, no_bar=False, decoder=<function decode_spec_tokens>, only_last_word=False)
Return text and the n_words that come after
The words are picked randomly among the predictions, depending on the probability of each index. no_unk means we never pick the UNK token, temperature is applied to the predictions, if min_p is passed, we don’t consider the indices with a probability lower than it. Set no_bar to True if you don’t want any progress bar, and you can pass a long a custom decoder to process the predicted tokens.
Learner convenience functions
language_model_learner
language_model_learner (dls, arch, config=None, drop_mult=1.0, backwards=False, pretrained=True, pretrained_fnames=None, loss_func:Callable|None=None, opt_func:Optimizer|OptimWrapper=<function Adam>, lr:float|slice=0.001, splitter:Callable=<function trainable_params>, cbs:Callback|MutableSequence|None=None, metrics:Callable|MutableSequence|None=None, path:str|Path|None=None, model_dir:str|Path='models', wd:float|int|None=None, wd_bn_bias:bool=False, train_bn:bool=True, moms:tuple=(0.95, 0.85, 0.95), default_cbs:bool=True)
Create a Learner with a language model from dls and arch.
| Type | Default | Details | |
|---|---|---|---|
| dls | DataLoaders | DataLoaders containing fastai or PyTorch DataLoaders |
|
| arch | |||
| config | NoneType | None | |
| drop_mult | float | 1.0 | |
| backwards | bool | False | |
| pretrained | bool | True | |
| pretrained_fnames | NoneType | None | |
| loss_func | Optional | None | Loss function. Defaults to dls loss |
| opt_func | fastai.optimizer.Optimizer | fastai.optimizer.OptimWrapper | Adam | Optimization function for training |
| lr | float | slice | 0.001 | Default learning rate |
| splitter | Callable | trainable_params | Split model into parameter groups. Defaults to one parameter group |
| cbs | fastai.callback.core.Callback | collections.abc.MutableSequence | None | None | Callbacks to add to Learner |
| metrics | Union | None | Metrics to calculate on validation set |
| path | str | pathlib.Path | None | None | Parent directory to save, load, and export models. Defaults to dls path |
| model_dir | str | pathlib.Path | models | Subdirectory to save and load models |
| wd | float | int | None | None | Default weight decay |
| wd_bn_bias | bool | False | Apply weight decay to normalization and bias parameters |
| train_bn | bool | True | Train frozen normalization layers |
| moms | tuple | (0.95, 0.85, 0.95) | Default momentum for schedulers |
| default_cbs | bool | True | Include default Callbacks |
You can use the config to customize the architecture used (change the values from awd_lstm_lm_config for this), pretrained will use fastai’s pretrained model for this arch (if available) or you can pass specific pretrained_fnames containing your own pretrained model and the corresponding vocabulary. All other arguments are passed to Learner.
path = untar_data(URLs.IMDB_SAMPLE)
df = pd.read_csv(path/'texts.csv')
dls = TextDataLoaders.from_df(df, path=path, text_col='text', is_lm=True, valid_col='is_valid')
learn = language_model_learner(dls, AWD_LSTM)You can then use the .predict method to generate new text.
learn.predict('This movie is about', n_words=20)'This movie is about plans by Tom Cruise to win a loyalty sharing award at the Battle of Christmas'
By default the entire sentence is fed again to the model after each predicted word, this little trick shows an improvement on the quality of the generated text. If you want to feed only the last word, specify argument only_last_word.
learn.predict('This movie is about', n_words=20, only_last_word=True)'This movie is about the J. Intelligent , ha - agency . Griffith , and Games on the early after'
text_classifier_learner
text_classifier_learner (dls, arch, seq_len=72, config=None, backwards=False, pretrained=True, drop_mult=0.5, n_out=None, lin_ftrs=None, ps=None, max_len=1440, y_range=None, loss_func:Callable|None=None, opt_func:Optimizer|OptimWrapper=<function Adam>, lr:float|slice=0.001, splitter:Callable=<function trainable_params>, cbs:Callback|MutableSequence|None=None, metrics:Callable|MutableSequence|None=None, path:str|Path|None=None, model_dir:str|Path='models', wd:float|int|None=None, wd_bn_bias:bool=False, train_bn:bool=True, moms:tuple=(0.95, 0.85, 0.95), default_cbs:bool=True)
Create a Learner with a text classifier from dls and arch.
| Type | Default | Details | |
|---|---|---|---|
| dls | DataLoaders | DataLoaders containing fastai or PyTorch DataLoaders |
|
| arch | |||
| seq_len | int | 72 | |
| config | NoneType | None | |
| backwards | bool | False | |
| pretrained | bool | True | |
| drop_mult | float | 0.5 | |
| n_out | NoneType | None | |
| lin_ftrs | NoneType | None | |
| ps | NoneType | None | |
| max_len | int | 1440 | |
| y_range | NoneType | None | |
| loss_func | Optional | None | Loss function. Defaults to dls loss |
| opt_func | fastai.optimizer.Optimizer | fastai.optimizer.OptimWrapper | Adam | Optimization function for training |
| lr | float | slice | 0.001 | Default learning rate |
| splitter | Callable | trainable_params | Split model into parameter groups. Defaults to one parameter group |
| cbs | fastai.callback.core.Callback | collections.abc.MutableSequence | None | None | Callbacks to add to Learner |
| metrics | Union | None | Metrics to calculate on validation set |
| path | str | pathlib.Path | None | None | Parent directory to save, load, and export models. Defaults to dls path |
| model_dir | str | pathlib.Path | models | Subdirectory to save and load models |
| wd | float | int | None | None | Default weight decay |
| wd_bn_bias | bool | False | Apply weight decay to normalization and bias parameters |
| train_bn | bool | True | Train frozen normalization layers |
| moms | tuple | (0.95, 0.85, 0.95) | Default momentum for schedulers |
| default_cbs | bool | True | Include default Callbacks |
You can use the config to customize the architecture used (change the values from awd_lstm_clas_config for this), pretrained will use fastai’s pretrained model for this arch (if available). drop_mult is a global multiplier applied to control all dropouts. n_out is usually inferred from the dls but you may pass it.
The model uses a SentenceEncoder, which means the texts are passed seq_len tokens at a time, and will only compute the gradients on the last max_len steps. lin_ftrs and ps are passed to get_text_classifier.
All other arguments are passed to Learner.
path = untar_data(URLs.IMDB_SAMPLE)
df = pd.read_csv(path/'texts.csv')
dls = TextDataLoaders.from_df(df, path=path, text_col='text', label_col='label', valid_col='is_valid')
learn = text_classifier_learner(dls, AWD_LSTM)