Member-only story
What Is The “Information Bottleneck” Problem in Seq2Seq Models?
There are a lot of deep explanations elsewhere so here I’d like to share some example questions in an interview setting.
What is the “information bottleneck” problem in Seq2Seq models?
What could be a solution to this problem?
Here are some example answers for readers’ reference:
Question 1:
In an encoder-decoder architecture (for instance, neural machine translation), the Recurrent Neural Network (RNN) will take in a tokenized version of a sentence in its encoder, then passes it on to the decoder. Using a a regular sequence-to-sequence model with LSTMs will work effectively for short to medium sentences but will start to degrade for longer ones. You can picture it like the figure below where all of the context of the input sentence is compressed into one vector that is passed into the decoder block. You can see how this will be an…