Common Evaluation Metrics Used for Classification Tasks — An Introduction

Rahul S
2 min readAug 16
src: link

Classification is a supervised learning task in machine learning that involves predicting the class label of a given input. There are several metrics used to evaluate the performance of a classification model.


Accuracy is defined as the ratio of the number of correctly classified samples to the total number of samples. It is a useful metric when there is no significant class imbalance. Otherwise the model becomes biased towards the majority class.


Precision is defined as the ratio of true positives to the total number of positive predictions.

It measures the proportion of positive predictions that are correct.

Useful when the cost of a false positive is high. For example, in medical diagnosis, a false positive can result in unnecessary treatment, which can be harmful to the patient.


Recall is defined as the ratio of true positives to the total number of actual positives.

It measures the proportion of actual positive samples that are correctly predicted.

Recall is useful when the cost of a false negative is high. For example, in spam email detection, a false negative can result in an important email being missed by the user.

src: Confusion matrix by Kefei Lu

F1 Score:

The F1 score is the harmonic mean of precision and recall. It combines both metrics into a single score that summarizes the overall performance of the model. It is useful when both precision and recall are equally important.

However, the F1 score can be misleading when the classes are imbalanced. It may give too much weight to the minority class, which can result in lower overall performance.


ROC-AUC stands for receiver operating characteristic — area under the curve.

It is a metric that evaluates the performance of a binary classification model by plotting the true positive rate against the false positive rate.

scr: ROC Curve by Martin Thoma

The area under the curve (AUC) measures the degree of separability between the positive and negative classes. A higher AUC indicates better performance.

ROC-AUC is useful when the cost of false positives and false negatives is approximately equal. It is also useful when the classes are imbalanced, as it takes into account the trade-off between true positives and false positives.

Rahul S

LLM, NLP, Statistics, MLOps | Senior AI Consultant | IIT Roorkee | Connect: []

Recommended from Medium