How to Add New Tokens to a Transformer Model Vocabulary?

Angelina Yang
2 min readJun 18, 2022

In this post, we will see how to expand the vocabulary of a transformers model by adding your own words or tokens.

Why do you need to expand the vocabulary?

All the language models that are trained for a specific task in NLP domain have a vocabulary. The vocabulary is the unique words of the text corpus that the model has been trained with. Therefore, depending on the domain and corpus, model includes a set of unique words. The pre-trained language models are no exception.