Member-only story

What Happens to the Neural Network Gradients When Initialized with Zeros?

2 min readSep 6, 2022

There are a lot of deep explanations elsewhere so here I’d like to share tips on what you can say during an interview setting.

What happens to the neural network gradients and weights when you initialize them with zeros?

Here are some example answers for readers’ reference:

Initializing all the weights with zeros leads the neurons to learn the same features during training.
In fact, any constant initialization scheme will perform very poorly. Consider a neural network with two hidden units, and assume we initialize all the biases to 0 and the weights with some constant α. If we forward propagate an input (x1, x2) in this network, the output of both hidden units will be relu(αx1+αx2). Thus, both hidden units will have identical influence on the cost, which will lead to identical gradients. Thus, both neurons will evolve symmetrically throughout training, effectively preventing different neurons from learning different things.

What Happens to the Neural Network Gradients When Initialized with Zeros?

Written by Angelina Yang

No responses yet