from nbdev import show_doc
Tensorboard
First thing first, you need to install tensorboard with
pip install tensorboard
Then launch tensorboard with
tensorboard --logdir=runs
in your terminal. You can change the logdir as long as it matches the log_dir
you pass to TensorBoardCallback
(default is runs
in the working directory).
Tensorboard Embedding Projector support
Tensorboard Embedding Projector is currently only supported for image classification
Export Image Features during Training
Tensorboard Embedding Projector is supported in TensorBoardCallback
(set parameter projector=True
) during training. The validation set embeddings will be written after each epoch.
cbs = [TensorBoardCallback(projector=True)]
learn = vision_learner(dls, resnet18, metrics=accuracy)
learn.fit_one_cycle(3, cbs=cbs)
Export Image Features during Inference
To write the embeddings for a custom dataset (e. g. after loading a learner) use TensorBoardProjectorCallback
. Add the callback manually to the learner.
learn = load_learner('path/to/export.pkl')
learn.add_cb(TensorBoardProjectorCallback())
dl = learn.dls.test_dl(files, with_labels=True)
_ = learn.get_preds(dl=dl)
If using a custom model (non fastai-resnet) pass the layer where the embeddings should be extracted as a callback-parameter.
layer = learn.model[1][1]
cbs = [TensorBoardProjectorCallback(layer=layer)]
preds = learn.get_preds(dl=dl, cbs=cbs)
Export Word Embeddings from Language Models
To export word embeddings from Language Models (tested with AWD_LSTM (fast.ai) and GPT2 / BERT (transformers)) but works with every model that contains an embedding layer.
For a fast.ai TextLearner or LMLearner just pass the learner - the embedding layer and vocab will be extracted automatically:
dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
projector_word_embeddings(learn=learn, limit=2000, start=2000)
For other language models - like the ones in the transformers library - you’ll have to pass the layer and vocab. Here’s an example for a BERT model.
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")
# get the word embedding layer
layer = model.embeddings.word_embeddings
# get and sort vocab
vocab_dict = tokenizer.get_vocab()
vocab = [k for k, v in sorted(vocab_dict.items(), key=lambda x: x[1])]
# write the embeddings for tb projector
projector_word_embeddings(layer=layer, vocab=vocab, limit=2000, start=2000)
TensorBoardBaseCallback
TensorBoardBaseCallback ()
Basic class handling tweaks of the training loop by changing a Learner
in various events
TensorBoardCallback
TensorBoardCallback (log_dir=None, trace_model=True, log_preds=True, n_preds=9, projector=False, layer=None)
Saves model topology, losses & metrics for tensorboard and tensorboard projector during training
TensorBoardProjectorCallback
TensorBoardProjectorCallback (log_dir=None, layer=None)
Extracts and exports image featuers for tensorboard projector during inference
projector_word_embeddings
projector_word_embeddings (learn=None, layer=None, vocab=None, limit=-1, start=0, log_dir=None)
Extracts and exports word embeddings from language models embedding layers
TensorBoardCallback
from fastai.vision.all import Resize, RandomSubsetSplitter, aug_transforms, vision_learner, resnet18
= untar_data(URLs.PETS)
path
= DataBlock(blocks=(ImageBlock, CategoryBlock),
db =get_image_files,
get_items=Resize(128),
item_tfms=RandomSubsetSplitter(train_sz=0.1, valid_sz=0.01),
splitter=aug_transforms(size=64),
batch_tfms=using_attr(RegexLabeller(r'(.+)_\d+.*$'), 'name'))
get_y
= db.dataloaders(path/'images') dls
= vision_learner(dls, resnet18, metrics=accuracy) learn
learn.unfreeze()3, cbs=TensorBoardCallback(Path.home()/'tmp'/'runs'/'tb', trace_model=True)) learn.fit_one_cycle(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 4.973294 | 5.009670 | 0.082192 | 00:03 |
1 | 4.382769 | 4.438282 | 0.095890 | 00:03 |
2 | 3.877172 | 3.665855 | 0.178082 | 00:04 |
Projector
Projector in TensorBoardCallback
= untar_data(URLs.PETS) path
= DataBlock(blocks=(ImageBlock, CategoryBlock),
db =get_image_files,
get_items=Resize(128),
item_tfms=RandomSubsetSplitter(train_sz=0.05, valid_sz=0.01),
splitter=aug_transforms(size=64),
batch_tfms=using_attr(RegexLabeller(r'(.+)_\d+.*$'), 'name'))
get_y
= db.dataloaders(path/'images') dls
= [TensorBoardCallback(log_dir=Path.home()/'tmp'/'runs'/'vision1', projector=True)]
cbs = vision_learner(dls, resnet18, metrics=accuracy) learn
learn.unfreeze()3, cbs=cbs) learn.fit_one_cycle(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 5.143322 | 6.736727 | 0.082192 | 00:03 |
1 | 4.508100 | 5.106580 | 0.109589 | 00:03 |
2 | 4.057889 | 4.194602 | 0.068493 | 00:03 |
TensorBoardProjectorCallback
= untar_data(URLs.PETS) path
= DataBlock(blocks=(ImageBlock, CategoryBlock),
db =get_image_files,
get_items=Resize(128),
item_tfms=RandomSubsetSplitter(train_sz=0.1, valid_sz=0.01),
splitter=aug_transforms(size=64),
batch_tfms=using_attr(RegexLabeller(r'(.+)_\d+.*$'), 'name'))
get_y
= db.dataloaders(path/'images') dls
= get_image_files(path/'images')
files = files[:256] files
= learn.dls.test_dl(files, with_labels=True) dl
= vision_learner(dls, resnet18, metrics=accuracy)
learn = learn.model[1][0].ap
layer = [TensorBoardProjectorCallback(layer=layer, log_dir=Path.home()/'tmp'/'runs'/'vision2')] cbs
= learn.get_preds(dl=dl, cbs=cbs) _
projector_word_embeddings
fastai text or lm learner
from fastai.text.all import TextDataLoaders, text_classifier_learner, AWD_LSTM
= TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
dls = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy) learn
=1000, log_dir=Path.home()/'tmp'/'runs'/'text') projector_word_embeddings(learn, limit
transformers
GPT2
from transformers import GPT2LMHeadModel, GPT2TokenizerFast
= GPT2TokenizerFast.from_pretrained('gpt2')
tokenizer = GPT2LMHeadModel.from_pretrained('gpt2')
model = model.transformer.wte
layer = tokenizer.get_vocab()
vocab_dict = [k for k, v in sorted(vocab_dict.items(), key=lambda x: x[1])]
vocab
=layer, vocab=vocab, limit=2000, log_dir=Path.home()/'tmp'/'runs'/'transformers') projector_word_embeddings(layer
BERT
from transformers import AutoTokenizer, AutoModel
= AutoTokenizer.from_pretrained("bert-base-uncased")
tokenizer = AutoModel.from_pretrained("bert-base-uncased")
model
= model.embeddings.word_embeddings
layer
= tokenizer.get_vocab()
vocab_dict = [k for k, v in sorted(vocab_dict.items(), key=lambda x: x[1])]
vocab
=layer, vocab=vocab, limit=2000, start=2000, log_dir=Path.home()/'tmp'/'runs'/'transformers') projector_word_embeddings(layer
warning: Embedding dir exists, did you set global_step for add_embedding()?
Validate results in tensorboard
Run the following command in the command line to check if the projector embeddings have been correctly wirtten:
tensorboard --logdir=~/tmp/runs
Open http://localhost:6006 in browser (TensorBoard Projector doesn’t work correctly in Safari!)