Computer Vision: U-Net

Convolutional Networks for Biomedical Image Segmentation

Rahul S
5 min readNov 25, 2022

--

U-Net is one of the most famous image segmentation architectures. It was proposed in 2015 by Olaf Ronneberger, Philipp Fischer, Thomas Brox (University of Freiburg, Germany). [1]

One should read the full paper to really relish the architechture. Following is an outlines treatment, suitable for beginners.

An end-to-end segmentation technique- U-Net takes a raw image in and outputs a segmentation map of the image.

The U-Net architecture is a U-shaped, symmetric convolutional network with a down-sampling contraction path and an up-sampling expansion path. The resulting segmented output image is much smaller than the raw input image. U-net only has Convolutional layers.

And the input image is fed into the network, the data is propagated through the network resulting in a segmented map as output.

Source: Ronneberger O., Fischer P., Brox T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N., Hornegger J., Wells W., Frangi A. (eds) Medical Image Computing and Computer-Assisted Intervention — MI

Contraction/down sampling path (Encoder Path):

The encoder path captures the context of the image. It is just a stack of convolution and max pooling layers.

The encoding path has 4 blocks. Each block consists of
1) Two 3 x 3 convolution layers + ReLU activation function (with batch normalization).
2) And. One 2 x 2 max pooling layer.

In the original paper, the size of the input image is 572 x 572 x 3. 64 (3 x 3) kernels produce a feature map of size (570 x 570 x 64). After another such operation, we get a new feature map of size (568 x 568 x 64). Now a MaxPooling (2 x 2) layer downsamples the feature map to 284 x 284 x 64.

Note that the number of feature maps doubles at each pooling, starting with 64 feature maps for the first block, 128 for the second, and so on.

Expansion/Up sampling path (Decoder Path)

Decoder enable precise localization using transposed convolutions (An upsampling technique).

The expansion path has 4 blocks. Each block consists of:
1) Deconvolution layer with stride 2.
2) Concatenation with the corresponding cropped feature map from the contracting path. i.e. At every…

--

--

Computer Vision: CNNs for Images. Why?

2 min read

Aug 17

Computer Vision with Neural Networks — an Overview

3 min read

Dec 9, 2022

Please explain “Non-Max Suppression” for us.

3 min read

Dec 5, 2022

Can you tell us something about ‘Global Average Pooling’?

3 min read

Nov 29, 2022

An Intuition of Neural Style Transfer

3 min read

Aug 11

Understanding Jaccard’s Index and Dice Coefficient in Object Detection and Image Segmentation

6 min read

Nov 22, 2022

Computer Vision: Semantic Segmentation- An Intuition

8 min read

Nov 23, 2022

Computer Vision: Upsampling2D & Conv2DTranspose layers in TensorFlow

3 min read

Nov 23, 2022

Computer Vision: Convolutional Neural Networks (CNNs)

8 min read

Nov 21, 2022

Rahul S

I learn as I write | LLM, NLP, Statistics, ML