Machine Learning: Correlation and Covariance

Rahul S
4 min readAug 23

COVARIANCE:

  • A statistical measure that quantifies how two random variables change together.
  • Indicates whether an increase in one variable corresponds to an increase or decrease in another.
  • The formula for the sample covariance between two variables, X and Y, with n data points is:

CORRELATION

  • A standardized measure that quantifies the strength and direction of the linear relationship between two variables.
  • Ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear correlation.
  • Formula for the sample correlation coefficient (Pearson correlation coefficient) between X and Y, with n data points, is:

DIFFERENCES

  • Covariance measures the degree to which two variables change together, but it doesn’t provide a standardized measure. Correlation measures the strength and direction of the linear relationship between two variables, providing a standardized value between -1 and 1.
  • Covariance is in the units of the product of the variables’ units. Correlation is unitless.
  • Covariance can take any real value. Correlation is restricted to the range of -1 to 1.
  • Covariance lacks a clear interpretation of the strength and direction of the relationship. Correlation offers a clear interpretation of the linear relationship’s strength and direction.
  • Covariance is not standardized. So it is sensitive to changes in the scale of variables. Correlation is standardized, allowing for easy comparison between different pairs of variables. Also, it is not sensitive to scale change.

Correlation is generally preferred for its standardized interpretation and scale-independence.

CORRELATION ANALYSIS

Rahul S

LLM, NLP, Statistics, MLOps | Senior AI Consultant | IIT Roorkee | Connect: [https://www.linkedin.com/in/rahultheogre/]