Member-only story

What’s the Derivative of RNN at Step T?

3 min readApr 6, 2023

There are a lot of explanations elsewhere, here I’d like to share some example questions in an interview setting.

One of the characteristics about a RNN language model is that it applies the same weight matrix repeatedly. So what is the derivative of our loss function, let’s say on step T, what is the derivative of that loss with respect to the repeated weight matrix of the hidden state Wh? And how do you calculate it?

Source: A Brief Overview of Recurrent Neural Networks (RNN)

Here are some tips for readers’ reference:

Since the weight matrix Wh is shared across all time steps in the RNN, the derivative of the loss function with respect to Wh at any given time step T is simply the sum of the derivatives of the loss function with respect to Wh at all time steps up to T.
The calculation is simply backpropagate over time steps from right to left: i = t, … 0, summing gradients as you go. You should be computing them cumulatively and not separately.
This is known as…

What’s the Derivative of RNN at Step T?

Written by Angelina Yang

No responses yet