Member-only story

What Is The “Information Bottleneck” Problem in Seq2Seq Models?

Angelina Yang
3 min readSep 27, 2022

--

There are a lot of deep explanations elsewhere so here I’d like to share some example questions in an interview setting.

What is the “information bottleneck” problem in Seq2Seq models?

Source: New Theory Cracks Open the Black Box of Deep Learning

What could be a solution to this problem?

Here are some example answers for readers’ reference:

Question 1:

In an encoder-decoder architecture (for instance, neural machine translation), the Recurrent Neural Network (RNN) will take in a tokenized version of a sentence in its encoder, then passes it on to the decoder. Using a a regular sequence-to-sequence model with LSTMs will work effectively for short to medium sentences but will start to degrade for longer ones. You can picture it like the figure below where all of the context of the input sentence is compressed into one vector that is passed into the decoder block. You can see how this will be an

--

--

No responses yet