Member-only story

What Is “Attention”, Really?

Angelina Yang
2 min readJul 22, 2022

--

“Attention” is one of the key ideas in the transformer architecture. There are a lot of deep explanations elsewhere so here I’d like to share tips on what you can say during an interview setting.

What is “attention”, really?

Here are some example answers for readers’ reference:

Sequence-to-sequence machine translation models suffer from the “informational bottleneck” problem, which is that all the information about the source sentence is forced into the single vector at the end of the encoder and is the only thing that gets passed along to the decoder. If some information about the source sentence isn’t in this vector, then the decoder is not going to be able to translate it correctly. This is the motivation for “attention”.

“Attention” is a neural technique and it provides a solution to the bottleneck problem. The core idea is that on each step of the decoder, you are going to use a direct connection to the encoder to focus on a particular part of the source sequence.

Watch the explanation by Dr.Abby See:

Check the explanation!

Happy practicing!

--

--

No responses yet