Computer Vision: MaxPooling and Dropouts
Let’s first have an overview of various elements of a typical CNN layer/operation:
Filter: Also called Kernel or Feature Detector. A small matrix. There can be multiple filters in a single convolutional layer. The same-sized filters are used within a convolutional layer. Each filter has a specific function. Multiple filters are used to identify a different set of features in the image.
The size of the filter and the number of filters are hyperparameters. The elements inside the filter define the filter configuration. These weights of the filter are learned during the training.
Feature map: The feature map stores the output of different convolution operations between the image and the filter(s). This is the input for the next pooling layer. The number of elements in the feature map equals the number of different image sections that we obtained by moving the filter(s) on the image. And number of features equals number of filters used.
Convolutional layers make our network translationally invariant. They look for patterns in the image and record whether they found those the patterns we are after in each part of the image. Thus, they help us detect patterns no matter where they appear in the image-space.
But we don’t usually need to know exactly where in an image a pattern is, down to the specific pixel. It’s good enough to know the patterns’ rough location in the image-space. To achieve this, we use MaxPooling.
Let’s look at the images above. Suppose the grid on the left is the output of a convolutional filter that has already run over a small part of our image. Now, we can pass its information directly to the next layer.
But if we can reduce the amount of information that we pass to the next layer, it will make the neural network’s job easier. That’s what max pooling does.
The idea of max pooling is to down sample the data by only passing on the most important bits.