Member-only story

How to Find Appropriate Initialization Values for Your Network Weights?

2 min readSep 6, 2022

There are a lot of deep explanations elsewhere so here I’d like to share tips on what you can say during an interview setting.

How to find appropriate initialization values for your network weights?

Here are some example answers for readers’ reference:

Choosing the appropriate weights plays a critical role in training your network. Choosing too large or too small initialization values can lead to the problem of vanishing or exploding gradients.

To prevent the gradients of the network’s activations from vanishing or exploding, we suggest to stick to the following rules of thumb:
- The mean of the activations should be zero.
- The variance of the activations should stay the same across every layer.
The recommended initialization is Xavier initialization (or one of its derived methods). In other words, all the weights of layer l are picked randomly from a normal distribution with mean μ=0 and standard deviation σ=1/sqrt(n[l−1]) where n[l−1] is the number of neurons in layer l−1. Biases are initialized with zeros.

Xavier initialization:

For every layer l:

How to Find Appropriate Initialization Values for Your Network Weights?

Written by Angelina Yang

No responses yet