Member-only story

What’s the Derivative of RNN at Step T?

Angelina Yang
3 min readApr 6, 2023

--

There are a lot of explanations elsewhere, here I’d like to share some example questions in an interview setting.

One of the characteristics about a RNN language model is that it applies the same weight matrix repeatedly. So what is the derivative of our loss function, let’s say on step T, what is the derivative of that loss with respect to the repeated weight matrix of the hidden state Wh? And how do you calculate it?

Source: A Brief Overview of Recurrent Neural Networks (RNN)

Here are some tips for readers’ reference:

Since the weight matrix Wh is shared across all time steps in the RNN, the derivative of the loss function with respect to Wh at any given time step T is simply the sum of the derivatives of the loss function with respect to Wh at all time steps up to T.

The calculation is simply backpropagate over time steps from right to left: i = t, … 0, summing gradients as you go. You should be computing them cumulatively and not separately.

This is known as

--

--

No responses yet