Machine Learning: Confusion matrix in classification problems

Rahul S
3 min readApr 19, 2023

A confusion matrix is a table that is used to evaluate the performance of a classification model by comparing predicted values against actual values. It is an important tool for understanding the accuracy of a model, and can help identify areas of improvement.

Suppose you are working for a bank and are responsible for assessing loan applications. You have built a machine learning model that predicts whether an applicant is likely to default on their loan or not. The model has been trained on historical data and has achieved an accuracy of 85%.

To evaluate the performance of the model, you can use a confusion matrix. The matrix is constructed by comparing the predicted values against the actual values, as shown below:

Let’s assume that you have 1000 loan applications, out of which 100 are likely to default. When you apply your model to these applications, you get the following results:

  • True Positive (TP) — the model correctly predicted that 50 applicants would default on their loan (i.e., 50 out of 100).
  • False Positive (FP) — the model predicted that 100 applicants would default on their loan, but in reality, only 50 of them did (i.e., 100–50). TYPE 1 ERROR
  • False Negative (FN) — the model predicted that 50 applicants would not default on their loan, but in reality, they did (i.e., 100–50). TYPE 2 ERROR
  • True Negative (TN) — the model correctly predicted that 800 applicants would not default on their loan (i.e., 800 out of 900).

Using this information, you can construct a confusion matrix, as shown below:

The confusion matrix provides a visual representation of the model’s performance. It can be used to calculate various metrics, such as precision, recall, and F1 score. These metrics can help identify areas of improvement and make necessary adjustments to the model.

For example, in this scenario, the model has a high number of false positives, which means that it is approving loans that are likely to default. To address this issue, you may need to adjust the model’s threshold or consider additional factors when evaluating loan applications.

In summary, a confusion matrix is a valuable tool for evaluating the performance of a classification model and can be used to identify areas of improvement. It is widely used in various industries, such as healthcare, finance, and marketing, to assess the accuracy of predictive models.

--

--