Data Science Interview Challenge — Mock Interview 🔛

3 min readJun 20, 2022

Welcome to today’s data science interview challenge! Today we’ll do a review:

Some of these questions I’ve used in interviewing candidates for machine learning data scientist roles. You should time yourself if you would like to do a trial run: roughly any 5 questions set should take 15–20 minutes to answer.

Here we go: ⏳

1. When building a neural network, what is the benefit of normalizing inputs?

See Answer

2. For two classification problems, one classifying images as cats and dogs, the other classifying images as day and night, which one would you choose a deeper neural network?

See Answer

3. How should we evaluate a neural machine translation system automatically?

See Answer

4. What is a typical evaluation metric for language models?

See Answer

5. What is the difference in the loss functions between a single layer and single neuron sigmoid activated logistic regression, versus one with three neurons?

See Answer

6. When using gradient descent, why do we want to use a “batch” of examples, rather than one single example in the training data set?

See Answer

7. What is the main disadvantage of sigmoid activation function?

See Answer

8. A brevity penalty is needed when using the BLEU metric to evaluate neural machine translation systems. Why is the brevity penalty needed?

See Answer

9. The famous ResNet (by He et al., 2015) architecture trained 152 layer deep neural network for ImageNet, what happens when we continue stacking deeper layers on a “plain” convolutional neural network?

ResNet Training and Testing Error Illustration (Source: Stanford CS231)

See Answer

10. A network would not just perform better simply because the network is deeper. The charts above show the comparison of training and test errors for a 20 layer network and a 56 layer network. What might be the reason that the deeper network does not outperform the shallower network?

See Answer

11. Why is sigmoid activation function useful?

See Answer

12. Is the “perplexity” bigger the better or smaller the better, for language models?

See Answer

13. Which derivatives need to be calculated first in order to update the weights through back propagation?

See Answer

14. When normalizing inputs for your neural network, on which dataset should the mean and standard deviation (μ, σ) of the input be calculated?

See Answer

Happy practicing! 🏆

Want to learn more about data science bits and pieces like this, or career development advice? Subscribe to our newsletter!

Data Science Interview Challenge — Mock Interview 🔛

Written by Angelina Yang

No responses yet