Computer Vision: U-Net
U-Net is one of the most famous image segmentation architectures. It was proposed in 2015 by Olaf Ronneberger, Philipp Fischer, Thomas Brox (University of Freiburg, Germany). [1]
One should read the full paper to really relish the architechture. Following is an outlines treatment, suitable for beginners.
An end-to-end segmentation technique- U-Net takes a raw image in and outputs a segmentation map of the image.
The U-Net architecture is a U-shaped, symmetric convolutional network with a down-sampling contraction path and an up-sampling expansion path. The resulting segmented output image is much smaller than the raw input image. U-net only has Convolutional layers.
And the input image is fed into the network, the data is propagated through the network resulting in a segmented map as output.
Contraction/down sampling path (Encoder Path):
The encoder path captures the context of the image. It is just a stack of convolution and max pooling layers.
The encoding path has 4 blocks. Each block consists of
1) Two 3 x 3 convolution layers + ReLU activation function (with batch normalization).
2) And. One 2 x 2…