What Are The Main Advantages of BERT Over LSTM Models?

Angelina Yang
2 min readAug 15, 2022

There are a lot of deep explanations of BERT and LSTM models elsewhere so here I’d like to share tips on what you can say during an interview setting.

What are the main advantages of BERT over LSTM models?

Source: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Here are some example answers for readers’ reference:

Question 1:

The main advantages of BERT over LSTM models are as follows (Watch the explanation by Dr. Jacob Devlin from Google AI Language:)

1.With the self-attention mechanism, BERT has no locality bias, which means long-distance context has “equal opportunity” to short-distance context.

Advantage #1

2. Single multiplication per layer improves efficiency on TPU, which means the effective batch size is the number of words and not sequences.

Advantage #2

Source: Stanford CS224N

Happy practicing!

--

--