What Is “Gradient Clipping”?
Gradient Clipping is a method where the error derivative is changed or clipped to a threshold during backward propagation through the network, and using the clipped gradients to update the weights. It’s one of the many ways to fix the gradient explosion problem. There are a lot of deep explanations on this topic elsewhere so here I’d like to share tips on what you can say during an interview setting.
What is “gradient clipping”?
Here are some example answers for readers’ reference:
The main idea of gradient clipping is that if the norm of your gradients is greater than some threshold and the threshold is a hyper-parameter that you can choose, then you want to scale down that gradient before you apply the stochastic gradient descent (SGD) update. The intuition is you are still going to take a step in the same direction but with a smaller step.
Watch the explanation by Dr. Abby See from Stanford:
Happy practicing!
Note: There are different angles to answer an interview question. The author of this newsletter does not try to find a reference that answers a question exhaustively. Rather, the author would like to share some quick insights and help the readers to think, practice and do further research as necessary.
Source of video: Stanford CS224N Lecture 7 (Winter 2019): NLP with Deep Learning — Vanishing Gradients, Fancy RNNs by Dr. Abby See
Source of images: Blog article: Understanding Gradient Clipping (and How It Can Fix Exploding Gradients Problem)
Thanks for reading my newsletter. You can also find the original post here, and follow me on Linkedin!