There are a lot of explanations elsewhere, here I’d like to share some example questions in an interview setting.
For a 32 by 32 by 3 input image, if we were to use 10 convolution filter size 5 by 5 and stride 1, what is the output activation map volume size look like, when there is padding size of 2?
Here are some tips for readers’ reference:
To calculate the output activation map volume size of a convolutional neural network (CNN) with a given input image size, filter size, stride, and padding, you can use the following formula:
Let’s apply this formula to the given values:
- Input image size: 32 by 32 by 3
- Filter size: 5 by 5
- Stride: 1
- Padding: 2
We then have:
So the resulting output activation map will have a size of 32 by 32 for each filter. Since we have 10 filters, the final output activation map volume size would be 32 by 32 by 10.
Let’s check out explanation by Serena Yeung from Stanford:
Note: There are different angles to answer an interview question. The author of this newsletter does not try to find a reference that answers a question exhaustively. Rather, the author would like to share some quick insights and help the readers to think, practice and do further research as necessary.
Source of answer: Lecture 5 | Convolutional Neural Networks by Serena Yeung