Computer Vision: U-Net
U-Net is one of the most famous image segmentation architectures. It was proposed in 2015 by Olaf Ronneberger, Philipp Fischer, Thomas Brox (University of Freiburg, Germany). 
One should read the full paper to really relish the architechture. Following is an outlines treatment, suitable for beginners.
An end-to-end segmentation technique- U-Net takes a raw image in and outputs a segmentation map of the image.
The U-Net architecture is a U-shaped, symmetric convolutional network with a down-sampling contraction path and an up-sampling expansion path. The resulting segmented output image is much smaller than the raw input image. U-net only has Convolutional layers.
And the input image is fed into the network, the data is propagated through the network resulting in a segmented map as output.
Contraction/down sampling path (Encoder Path):
The encoder path captures the context of the image. It is just a stack of convolution and max pooling layers.
The encoding path has 4 blocks. Each block consists of
1) Two 3 x 3 convolution layers + ReLU activation function (with batch normalization).
2) And. One 2 x 2 max pooling layer.
In the original paper, the size of the input image is 572 x 572 x 3. 64 (3 x 3) kernels produce a feature map of size (570 x 570 x 64). After another such operation, we get a new feature map of size (568 x 568 x 64). Now a MaxPooling (2 x 2) layer downsamples the feature map to 284 x 284 x 64.
Note that the number of feature maps doubles at each pooling, starting with 64 feature maps for the first block, 128 for the second, and so on.
Expansion/Up sampling path (Decoder Path)
Decoder enable precise localization using transposed convolutions (An upsampling technique).
The expansion path has 4 blocks. Each block consists of:
1) Deconvolution layer with stride 2.
2) Concatenation with the corresponding cropped feature map from the contracting path. i.e. At every…