What Can Go Wrong When Fine-tuning BERT?

Angelina Yang
2 min readFeb 27, 2023

There are a lot of explanations elsewhere, here I’d like to share some example questions in an interview setting.

When fine-tuning BERT (Bidirectional Encoder Representations from Transformers) for your use case, what can go wrong? Or what should you pay attention to?

Source: Illustration of the pre-training / fine-tuning approach. 3 different downstream NLP tasks, MNLI, NER, and SQuAD, are all solved with the same pre-trained language model, by fine-tuning on the specific task. Image credit: Devlin et al 2019.

--

--