Machine Learning: Correlation and Covariance

Rahul S
4 min readAug 23


  • A statistical measure that quantifies how two random variables change together.
  • Indicates whether an increase in one variable corresponds to an increase or decrease in another.
  • The formula for the sample covariance between two variables, X and Y, with n data points is:


  • A standardized measure that quantifies the strength and direction of the linear relationship between two variables.
  • Ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear correlation.
  • Formula for the sample correlation coefficient (Pearson correlation coefficient) between X and Y, with n data points, is:


  • Covariance measures the degree to which two variables change together, but it doesn’t provide a standardized measure. Correlation measures the strength and direction of the linear relationship between two variables, providing a standardized value between -1 and 1.
  • Covariance is in the units of the product of the variables’ units. Correlation is unitless.
  • Covariance can take any real value. Correlation is restricted to the range of -1 to 1.
  • Covariance lacks a clear interpretation of the strength and direction of the relationship. Correlation offers a clear interpretation of the linear relationship’s strength and direction.
  • Covariance is not standardized. So it is sensitive to changes in the scale of variables. Correlation is standardized, allowing for easy comparison between different pairs of variables. Also, it is not sensitive to scale change.

Correlation is generally preferred for its standardized interpretation and scale-independence.


Improving Data Quality: The Foundation for Accurate and Reliable Models

Machine Learning: Confusion matrix in classification problems

Machine Learning: Data Drift and Concept Drift

Machine Learning- Data Leakage

Machine Learning — Cost Function, An Introduction

Machine Learning: Cross Entropy and Cross-Entropy Loss

Machine Learning: Interpretation of Loss Function with Cross-Entropy Loss

Introduction to Gaussian Mixture Models (GMM) with Expectation-Maximization (EM)

DBSCAN: Intution, Advantages, and Points to Remember

Machine Learning: Regularization for Overfitting

Rahul S

I learn as I write | LLM, NLP, Statistics, ML