Text learner

All the functions necessary to build Learner suitable for transfer learning in NLP

The most important functions of this module are language_model_learner and text_classifier_learner. They will help you define a Learner using a pretrained model. See the text tutorial for examples of use.

Loading a pretrained model

In text, to load a pretrained model, we need to adapt the embeddings of the vocabulary used for the pre-training to the vocabulary of our current corpus.


source

match_embeds


def match_embeds(
    old_wgts:dict, # Embedding weights
    old_vocab:list, # Vocabulary of corpus used for pre-training
    new_vocab:list, # Current corpus vocabulary
)->dict:

Convert the embedding in old_wgts to go from old_vocab to new_vocab.

For words in new_vocab that don’t have a corresponding match in old_vocab, we use the mean of all pretrained embeddings.

wgts = {'0.encoder.weight': torch.randn(5,3)}
new_wgts = match_embeds(wgts.copy(), ['a', 'b', 'c'], ['a', 'c', 'd', 'b'])
old,new = wgts['0.encoder.weight'],new_wgts['0.encoder.weight']
test_eq(new[0], old[0])
test_eq(new[1], old[2])
test_eq(new[2], old.mean(0))
test_eq(new[3], old[1])

source

load_ignore_keys


def load_ignore_keys(
    model, # Model architecture
    wgts:dict, # Model weights
)->tuple:

Load wgts in model ignoring the names of the keys, just taking parameters in order


source

clean_raw_keys


def clean_raw_keys(
    wgts:dict
):

source

load_model_text


def load_model_text(
    file:str, # File name of saved text model
    model, # Model architecture
    opt:Optimizer, # [`Optimizer`](https://docs.fast.ai/optimizer.html#optimizer) used to fit the model
    with_opt:bool=None, # Enable to load [`Optimizer`](https://docs.fast.ai/optimizer.html#optimizer) state
    device:int | str | torch.device=None, # Sets the device, uses 'cpu' if unspecified
    strict:bool=True, # Whether to strictly enforce the keys of `file`s state dict match with the model `Module.state_dict`
    kwargs:VAR_KEYWORD
):

Load model from file along with opt (if available, and if with_opt)


source

TextLearner


def TextLearner(
    dls:DataLoaders, # Text [`DataLoaders`](https://docs.fast.ai/data.core.html#dataloaders)
    model, # A standard PyTorch model
    alpha:float=2.0, # Param for [`RNNRegularizer`](https://docs.fast.ai/callback.rnn.html#rnnregularizer)
    beta:float=1.0, # Param for [`RNNRegularizer`](https://docs.fast.ai/callback.rnn.html#rnnregularizer)
    moms:tuple=(0.8, 0.7, 0.8), # Momentum for `Cosine Annealing Scheduler`
    kwargs:VAR_KEYWORD
):

Basic class for a Learner in NLP.

rnn_cbs(2., 1.)
[ModelResetter, RNNCallback, RNNRegularizer]

Adds a ModelResetter and an RNNRegularizer with alpha and beta to the callbacks, the rest is the same as Learner init.

This Learner adds functionality to the base class:


source

TextLearner.load_pretrained


def load_pretrained(
    wgts_fname:str, # Filename of saved weights
    vocab_fname:str, # Saved vocabulary filename in pickle format
    model:NoneType=None, # Model to load parameters from, defaults to `Learner.model`
):

Load a pretrained model and adapt it to the data vocabulary.

wgts_fname should point to the weights of the pretrained model and vocab_fname to the vocabulary used to pretrain it.


source

TextLearner.save_encoder


def save_encoder(
    file:str, # Filename for `Encoder`
):

Save the encoder to file in the model directory

The model directory is Learner.path/Learner.model_dir.


source

TextLearner.load_encoder


def load_encoder(
    file:str, # Filename of the saved encoder
    device:int | str | torch.device=None, # Device used to load, defaults to `dls` device
):

Load the encoder file from the model directory, optionally ensuring it’s on device

Language modeling predictions

For language modeling, the predict method is quite different from the other applications, which is why it needs its own subclass.


source

decode_spec_tokens


def decode_spec_tokens(
    tokens
):

Decode the special tokens in tokens

test_eq(decode_spec_tokens(['xxmaj', 'text']), ['Text'])
test_eq(decode_spec_tokens(['xxup', 'text']), ['TEXT'])
test_eq(decode_spec_tokens(['xxrep', '3', 'a']), ['aaa'])
test_eq(decode_spec_tokens(['xxwrep', '3', 'word']), ['word', 'word', 'word'])

source

LMLearner


def LMLearner(
    dls:DataLoaders, # Text [`DataLoaders`](https://docs.fast.ai/data.core.html#dataloaders)
    model, # A standard PyTorch model
    alpha:float=2.0, # Param for [`RNNRegularizer`](https://docs.fast.ai/callback.rnn.html#rnnregularizer)
    beta:float=1.0, # Param for [`RNNRegularizer`](https://docs.fast.ai/callback.rnn.html#rnnregularizer)
    moms:tuple=(0.8, 0.7, 0.8), # Momentum for `Cosine Annealing Scheduler`
    kwargs:VAR_KEYWORD
):

Add functionality to TextLearner when dealing with a language model


source

LMLearner.predict


def predict(
    text, n_words:int=1, no_unk:bool=True, temperature:float=1.0, min_p:NoneType=None, no_bar:bool=False,
    decoder:function=decode_spec_tokens, only_last_word:bool=False
):

Return text and the n_words that come after

The words are picked randomly among the predictions, depending on the probability of each index. no_unk means we never pick the UNK token, temperature is applied to the predictions, if min_p is passed, we don’t consider the indices with a probability lower than it. Set no_bar to True if you don’t want any progress bar, and you can pass a long a custom decoder to process the predicted tokens.

Learner convenience functions


source

language_model_learner


def language_model_learner(
    dls, # [`DataLoaders`](https://docs.fast.ai/data.core.html#dataloaders) containing fastai or PyTorch [`DataLoader`](https://docs.fast.ai/data.load.html#dataloader)s
    arch, config:NoneType=None, drop_mult:float=1.0, backwards:bool=False, pretrained:bool=True,
    pretrained_fnames:NoneType=None, loss_func:Callable | None=None, # Loss function. Defaults to `dls` loss
    opt_func:Optimizer | OptimWrapper=Adam, # Optimization function for training
    lr:float | slice=0.001, # Default learning rate
    splitter:Callable=trainable_params, # Split model into parameter groups. Defaults to one parameter group
    cbs:Callback | MutableSequence | None=None, # [`Callback`](https://docs.fast.ai/callback.core.html#callback)s to add to [`Learner`](https://docs.fast.ai/learner.html#learner)
    metrics:Callable | MutableSequence | None=None, # [`Metric`](https://docs.fast.ai/learner.html#metric)s to calculate on validation set
    path:str | Path | None=None, # Parent directory to save, load, and export models. Defaults to `dls` `path`
    model_dir:str | Path='models', # Subdirectory to save and load models
    wd:float | int | None=None, # Default weight decay
    wd_bn_bias:bool=False, # Apply weight decay to normalization and bias parameters
    train_bn:bool=True, # Train frozen normalization layers
    moms:tuple=(0.95, 0.85, 0.95), # Default momentum for schedulers
    default_cbs:bool=True, # Include default [`Callback`](https://docs.fast.ai/callback.core.html#callback)s
):

Create a Learner with a language model from dls and arch.

You can use the config to customize the architecture used (change the values from awd_lstm_lm_config for this), pretrained will use fastai’s pretrained model for this arch (if available) or you can pass specific pretrained_fnames containing your own pretrained model and the corresponding vocabulary. All other arguments are passed to Learner.

path = untar_data(URLs.IMDB_SAMPLE)
df = pd.read_csv(path/'texts.csv')
dls = TextDataLoaders.from_df(df, path=path, text_col='text', is_lm=True, valid_col='is_valid')
learn = language_model_learner(dls, AWD_LSTM)

You can then use the .predict method to generate new text.

learn.predict('This movie is about', n_words=20)
'This movie is about plans by Tom Cruise to win a loyalty sharing award at the Battle of Christmas'

By default the entire sentence is fed again to the model after each predicted word, this little trick shows an improvement on the quality of the generated text. If you want to feed only the last word, specify argument only_last_word.

learn.predict('This movie is about', n_words=20, only_last_word=True)
'This movie is about the J. Intelligent , ha - agency . Griffith , and Games on the early after'

source

text_classifier_learner


def text_classifier_learner(
    dls, # [`DataLoaders`](https://docs.fast.ai/data.core.html#dataloaders) containing fastai or PyTorch [`DataLoader`](https://docs.fast.ai/data.load.html#dataloader)s
    arch, seq_len:int=72, config:NoneType=None, backwards:bool=False, pretrained:bool=True, drop_mult:float=0.5,
    n_out:NoneType=None, lin_ftrs:NoneType=None, ps:NoneType=None, max_len:int=1440, y_range:NoneType=None,
    loss_func:Callable | None=None, # Loss function. Defaults to `dls` loss
    opt_func:Optimizer | OptimWrapper=Adam, # Optimization function for training
    lr:float | slice=0.001, # Default learning rate
    splitter:Callable=trainable_params, # Split model into parameter groups. Defaults to one parameter group
    cbs:Callback | MutableSequence | None=None, # [`Callback`](https://docs.fast.ai/callback.core.html#callback)s to add to [`Learner`](https://docs.fast.ai/learner.html#learner)
    metrics:Callable | MutableSequence | None=None, # [`Metric`](https://docs.fast.ai/learner.html#metric)s to calculate on validation set
    path:str | Path | None=None, # Parent directory to save, load, and export models. Defaults to `dls` `path`
    model_dir:str | Path='models', # Subdirectory to save and load models
    wd:float | int | None=None, # Default weight decay
    wd_bn_bias:bool=False, # Apply weight decay to normalization and bias parameters
    train_bn:bool=True, # Train frozen normalization layers
    moms:tuple=(0.95, 0.85, 0.95), # Default momentum for schedulers
    default_cbs:bool=True, # Include default [`Callback`](https://docs.fast.ai/callback.core.html#callback)s
):

Create a Learner with a text classifier from dls and arch.

You can use the config to customize the architecture used (change the values from awd_lstm_clas_config for this), pretrained will use fastai’s pretrained model for this arch (if available). drop_mult is a global multiplier applied to control all dropouts. n_out is usually inferred from the dls but you may pass it.

The model uses a SentenceEncoder, which means the texts are passed seq_len tokens at a time, and will only compute the gradients on the last max_len steps. lin_ftrs and ps are passed to get_text_classifier.

All other arguments are passed to Learner.

path = untar_data(URLs.IMDB_SAMPLE)
df = pd.read_csv(path/'texts.csv')
dls = TextDataLoaders.from_df(df, path=path, text_col='text', label_col='label', valid_col='is_valid')
learn = text_classifier_learner(dls, AWD_LSTM)