What’s the Difference Between Attention and Self-attention in Transformer Models?
“Attention” is one of the key ideas in the transformer architecture. There are a lot of deep explanations elsewhere so here we’d like to share tips on what you can say during an interview setting.
What’s the difference between attention and self-attention in transformer models?
Here are some example answers for readers’ reference:
Attention connecting between the encoder and the decoder is called cross-attention since keys and values are generated by a different sequence than queries. If the keys, values, and queries are generated from the same sequence, then we call it self-attention. The attention mechanism allows output to focus attention on input when producing output while the self-attention model allows inputs to interact with each other.