In semantic segmentation, we associate each pixel of an image with a corresponding class of what is being represented. So, it’s an inference task at pixel level and is also called a dense prediction task. In this article, we will introduce ourselves to this all-important topic from a non-mathematical intuitive perspective.
CLASSIFICATION
We have multi-class classification and also multi-label classification, and both are different. In the first one, we have images, and we have classes associated with them. In the other one, the model tries to tell us about multiple things/objects that are in the image.
For example, if there’s a picture of a monkey eating a banana, instead of just classifying that the image as a monkey or a banana, a multi-label classifier can identify both- the label for the monkey and the label for the banana. And if there’s a picture of three snakes, instead of just classifying the image as snake, the multi-label classifier produces three snake labels- one for each snake in the image.
Multi-class is one in which we train our network to recognize more than one class. And multi-label is one when we perform inference on the image to detect more than one thing in the image.
OBJECT DETECTION
Now, moving beyond classification, we may be interested in knowing not just what is in the image but also where in the image it is. Identifying the location of an object…