Text learner

All the functions necessary to build Learner suitable for transfer learning in NLP

The most important functions of this module are language_model_learner and text_classifier_learner. They will help you define a Learner using a pretrained model. See the text tutorial for examples of use.

Loading a pretrained model

In text, to load a pretrained model, we need to adapt the embeddings of the vocabulary used for the pre-training to the vocabulary of our current corpus.

source

match_embeds

 match_embeds (old_wgts:dict, old_vocab:list, new_vocab:list)

Convert the embedding in old_wgts to go from old_vocab to new_vocab.

	Type	Details
old_wgts	dict	Embedding weights
old_vocab	list	Vocabulary of corpus used for pre-training
new_vocab	list	Current corpus vocabulary
Returns	dict

For words in new_vocab that don’t have a corresponding match in old_vocab, we use the mean of all pretrained embeddings.

wgts = {'0.encoder.weight': torch.randn(5,3)}
new_wgts = match_embeds(wgts.copy(), ['a', 'b', 'c'], ['a', 'c', 'd', 'b'])
old,new = wgts['0.encoder.weight'],new_wgts['0.encoder.weight']
test_eq(new[0], old[0])
test_eq(new[1], old[2])
test_eq(new[2], old.mean(0))
test_eq(new[3], old[1])

source

load_ignore_keys

 load_ignore_keys (model, wgts:dict)

Load wgts in model ignoring the names of the keys, just taking parameters in order

	Type	Details
model		Model architecture
wgts	dict	Model weights
Returns	tuple

source

clean_raw_keys

 clean_raw_keys (wgts:dict)

source

load_model_text

 load_model_text (file:str, model, opt:fastai.optimizer.Optimizer,
                  with_opt:bool=None, device:int|str|torch.device=None,
                  strict:bool=True, **kwargs)

Load model from file along with opt (if available, and if with_opt)

	Type	Default	Details
file	str		File name of saved text model
model			Model architecture
opt	Optimizer		`Optimizer` used to fit the model
with_opt	bool	None	Enable to load `Optimizer` state
device	int \| str \| torch.device	None	Sets the device, uses ‘cpu’ if unspecified
strict	bool	True	Whether to strictly enforce the keys of `file`s state dict match with the model `Module.state_dict`
kwargs	VAR_KEYWORD

source

TextLearner

 TextLearner (dls:DataLoaders, model, alpha:float=2.0, beta:float=1.0,
              moms:tuple=(0.8, 0.7, 0.8), loss_func:Callable|None=None,
              opt_func:Optimizer|OptimWrapper=<function Adam>,
              lr:float|slice=0.001, splitter:Callable=<function
              trainable_params>, cbs:Callback|MutableSequence|None=None,
              metrics:Callable|MutableSequence|None=None,
              path:str|Path|None=None, model_dir:str|Path='models',
              wd:float|int|None=None, wd_bn_bias:bool=False,
              train_bn:bool=True, default_cbs:bool=True)

Basic class for a Learner in NLP.

	Type	Default	Details
dls	DataLoaders		Text `DataLoaders`
model			A standard PyTorch model
alpha	float	2.0	Param for `RNNRegularizer`
beta	float	1.0	Param for `RNNRegularizer`
moms	tuple	(0.8, 0.7, 0.8)	Momentum for `Cosine Annealing Scheduler`
loss_func	Optional	None	Loss function. Defaults to `dls` loss
opt_func	fastai.optimizer.Optimizer \| fastai.optimizer.OptimWrapper	Adam	Optimization function for training
lr	float \| slice	0.001	Default learning rate
splitter	Callable	trainable_params	Split model into parameter groups. Defaults to one parameter group
cbs	fastai.callback.core.Callback \| collections.abc.MutableSequence \| None	None	`Callback`s to add to `Learner`
metrics	Union	None	`Metric`s to calculate on validation set
path	str \| pathlib.Path \| None	None	Parent directory to save, load, and export models. Defaults to `dls` `path`
model_dir	str \| pathlib.Path	models	Subdirectory to save and load models
wd	float \| int \| None	None	Default weight decay
wd_bn_bias	bool	False	Apply weight decay to normalization and bias parameters
train_bn	bool	True	Train frozen normalization layers
default_cbs	bool	True	Include default `Callback`s

rnn_cbs(2., 1.)

[ModelResetter, RNNCallback, RNNRegularizer]

Adds a ModelResetter and an RNNRegularizer with alpha and beta to the callbacks, the rest is the same as Learner init.

This Learner adds functionality to the base class:

source

TextLearner.load_pretrained

 TextLearner.load_pretrained (wgts_fname:str, vocab_fname:str, model=None)

Load a pretrained model and adapt it to the data vocabulary.

	Type	Default	Details
wgts_fname	str		Filename of saved weights
vocab_fname	str		Saved vocabulary filename in pickle format
model	NoneType	None	Model to load parameters from, defaults to `Learner.model`

wgts_fname should point to the weights of the pretrained model and vocab_fname to the vocabulary used to pretrain it.

source

TextLearner.save_encoder

 TextLearner.save_encoder (file:str)

Save the encoder to file in the model directory

	Type	Details
file	str	Filename for `Encoder`

The model directory is Learner.path/Learner.model_dir.

source

TextLearner.load_encoder

 TextLearner.load_encoder (file:str, device:int|str|torch.device=None)

Load the encoder file from the model directory, optionally ensuring it’s on device

	Type	Default	Details
file	str		Filename of the saved encoder
device	int \| str \| torch.device	None	Device used to load, defaults to `dls` device

Language modeling predictions

For language modeling, the predict method is quite different from the other applications, which is why it needs its own subclass.

source

decode_spec_tokens

 decode_spec_tokens (tokens)

Decode the special tokens in tokens

test_eq(decode_spec_tokens(['xxmaj', 'text']), ['Text'])
test_eq(decode_spec_tokens(['xxup', 'text']), ['TEXT'])
test_eq(decode_spec_tokens(['xxrep', '3', 'a']), ['aaa'])
test_eq(decode_spec_tokens(['xxwrep', '3', 'word']), ['word', 'word', 'word'])

source

LMLearner

 LMLearner (dls:DataLoaders, model, alpha:float=2.0, beta:float=1.0,
            moms:tuple=(0.8, 0.7, 0.8), loss_func:Callable|None=None,
            opt_func:Optimizer|OptimWrapper=<function Adam>,
            lr:float|slice=0.001, splitter:Callable=<function
            trainable_params>, cbs:Callback|MutableSequence|None=None,
            metrics:Callable|MutableSequence|None=None,
            path:str|Path|None=None, model_dir:str|Path='models',
            wd:float|int|None=None, wd_bn_bias:bool=False,
            train_bn:bool=True, default_cbs:bool=True)

Add functionality to TextLearner when dealing with a language model

	Type	Default	Details
dls	DataLoaders		Text `DataLoaders`
model			A standard PyTorch model
alpha	float	2.0	Param for `RNNRegularizer`
beta	float	1.0	Param for `RNNRegularizer`
moms	tuple	(0.8, 0.7, 0.8)	Momentum for `Cosine Annealing Scheduler`
loss_func	Optional	None	Loss function. Defaults to `dls` loss
opt_func	fastai.optimizer.Optimizer \| fastai.optimizer.OptimWrapper	Adam	Optimization function for training
lr	float \| slice	0.001	Default learning rate
splitter	Callable	trainable_params	Split model into parameter groups. Defaults to one parameter group
cbs	fastai.callback.core.Callback \| collections.abc.MutableSequence \| None	None	`Callback`s to add to `Learner`
metrics	Union	None	`Metric`s to calculate on validation set
path	str \| pathlib.Path \| None	None	Parent directory to save, load, and export models. Defaults to `dls` `path`
model_dir	str \| pathlib.Path	models	Subdirectory to save and load models
wd	float \| int \| None	None	Default weight decay
wd_bn_bias	bool	False	Apply weight decay to normalization and bias parameters
train_bn	bool	True	Train frozen normalization layers
default_cbs	bool	True	Include default `Callback`s

source

LMLearner.predict

 LMLearner.predict (text, n_words=1, no_unk=True, temperature=1.0,
                    min_p=None, no_bar=False, decoder=<function
                    decode_spec_tokens>, only_last_word=False)

Return text and the n_words that come after

The words are picked randomly among the predictions, depending on the probability of each index. no_unk means we never pick the UNK token, temperature is applied to the predictions, if min_p is passed, we don’t consider the indices with a probability lower than it. Set no_bar to True if you don’t want any progress bar, and you can pass a long a custom decoder to process the predicted tokens.

`Learner` convenience functions

source

language_model_learner

 language_model_learner (dls, arch, config=None, drop_mult=1.0,
                         backwards=False, pretrained=True,
                         pretrained_fnames=None,
                         loss_func:Callable|None=None,
                         opt_func:Optimizer|OptimWrapper=<function Adam>,
                         lr:float|slice=0.001, splitter:Callable=<function
                         trainable_params>,
                         cbs:Callback|MutableSequence|None=None,
                         metrics:Callable|MutableSequence|None=None,
                         path:str|Path|None=None,
                         model_dir:str|Path='models',
                         wd:float|int|None=None, wd_bn_bias:bool=False,
                         train_bn:bool=True, moms:tuple=(0.95, 0.85,
                         0.95), default_cbs:bool=True)

Create a Learner with a language model from dls and arch.

	Type	Default	Details
dls	DataLoaders		`DataLoaders` containing fastai or PyTorch `DataLoader`s
arch
config	NoneType	None
drop_mult	float	1.0
backwards	bool	False
pretrained	bool	True
pretrained_fnames	NoneType	None
loss_func	Optional	None	Loss function. Defaults to `dls` loss
opt_func	fastai.optimizer.Optimizer \| fastai.optimizer.OptimWrapper	Adam	Optimization function for training
lr	float \| slice	0.001	Default learning rate
splitter	Callable	trainable_params	Split model into parameter groups. Defaults to one parameter group
cbs	fastai.callback.core.Callback \| collections.abc.MutableSequence \| None	None	`Callback`s to add to `Learner`
metrics	Union	None	`Metric`s to calculate on validation set
path	str \| pathlib.Path \| None	None	Parent directory to save, load, and export models. Defaults to `dls` `path`
model_dir	str \| pathlib.Path	models	Subdirectory to save and load models
wd	float \| int \| None	None	Default weight decay
wd_bn_bias	bool	False	Apply weight decay to normalization and bias parameters
train_bn	bool	True	Train frozen normalization layers
moms	tuple	(0.95, 0.85, 0.95)	Default momentum for schedulers
default_cbs	bool	True	Include default `Callback`s

You can use the config to customize the architecture used (change the values from awd_lstm_lm_config for this), pretrained will use fastai’s pretrained model for this arch (if available) or you can pass specific pretrained_fnames containing your own pretrained model and the corresponding vocabulary. All other arguments are passed to Learner.

path = untar_data(URLs.IMDB_SAMPLE)
df = pd.read_csv(path/'texts.csv')
dls = TextDataLoaders.from_df(df, path=path, text_col='text', is_lm=True, valid_col='is_valid')
learn = language_model_learner(dls, AWD_LSTM)

You can then use the .predict method to generate new text.

learn.predict('This movie is about', n_words=20)

'This movie is about plans by Tom Cruise to win a loyalty sharing award at the Battle of Christmas'

By default the entire sentence is fed again to the model after each predicted word, this little trick shows an improvement on the quality of the generated text. If you want to feed only the last word, specify argument only_last_word.

learn.predict('This movie is about', n_words=20, only_last_word=True)

'This movie is about the J. Intelligent , ha - agency . Griffith , and Games on the early after'

source

text_classifier_learner

 text_classifier_learner (dls, arch, seq_len=72, config=None,
                          backwards=False, pretrained=True, drop_mult=0.5,
                          n_out=None, lin_ftrs=None, ps=None,
                          max_len=1440, y_range=None,
                          loss_func:Callable|None=None,
                          opt_func:Optimizer|OptimWrapper=<function Adam>,
                          lr:float|slice=0.001,
                          splitter:Callable=<function trainable_params>,
                          cbs:Callback|MutableSequence|None=None,
                          metrics:Callable|MutableSequence|None=None,
                          path:str|Path|None=None,
                          model_dir:str|Path='models',
                          wd:float|int|None=None, wd_bn_bias:bool=False,
                          train_bn:bool=True, moms:tuple=(0.95, 0.85,
                          0.95), default_cbs:bool=True)

Create a Learner with a text classifier from dls and arch.

	Type	Default	Details
dls	DataLoaders		`DataLoaders` containing fastai or PyTorch `DataLoader`s
arch
seq_len	int	72
config	NoneType	None
backwards	bool	False
pretrained	bool	True
drop_mult	float	0.5
n_out	NoneType	None
lin_ftrs	NoneType	None
ps	NoneType	None
max_len	int	1440
y_range	NoneType	None
loss_func	Optional	None	Loss function. Defaults to `dls` loss
opt_func	fastai.optimizer.Optimizer \| fastai.optimizer.OptimWrapper	Adam	Optimization function for training
lr	float \| slice	0.001	Default learning rate
splitter	Callable	trainable_params	Split model into parameter groups. Defaults to one parameter group
cbs	fastai.callback.core.Callback \| collections.abc.MutableSequence \| None	None	`Callback`s to add to `Learner`
metrics	Union	None	`Metric`s to calculate on validation set
path	str \| pathlib.Path \| None	None	Parent directory to save, load, and export models. Defaults to `dls` `path`
model_dir	str \| pathlib.Path	models	Subdirectory to save and load models
wd	float \| int \| None	None	Default weight decay
wd_bn_bias	bool	False	Apply weight decay to normalization and bias parameters
train_bn	bool	True	Train frozen normalization layers
moms	tuple	(0.95, 0.85, 0.95)	Default momentum for schedulers
default_cbs	bool	True	Include default `Callback`s

You can use the config to customize the architecture used (change the values from awd_lstm_clas_config for this), pretrained will use fastai’s pretrained model for this arch (if available). drop_mult is a global multiplier applied to control all dropouts. n_out is usually inferred from the dls but you may pass it.

The model uses a SentenceEncoder, which means the texts are passed seq_len tokens at a time, and will only compute the gradients on the last max_len steps. lin_ftrs and ps are passed to get_text_classifier.

All other arguments are passed to Learner.

path = untar_data(URLs.IMDB_SAMPLE)
df = pd.read_csv(path/'texts.csv')
dls = TextDataLoaders.from_df(df, path=path, text_col='text', label_col='label', valid_col='is_valid')
learn = text_classifier_learner(dls, AWD_LSTM)

Loading a pretrained model

match_embeds

load_ignore_keys

clean_raw_keys

load_model_text

TextLearner

TextLearner.load_pretrained

TextLearner.save_encoder

TextLearner.load_encoder

Language modeling predictions

decode_spec_tokens

LMLearner

LMLearner.predict

Learner convenience functions

language_model_learner

text_classifier_learner

`Learner` convenience functions