Partial Least Squares (PLS): Taming High-Dimensional Data and Capturing Complex Relationships
Partial Least Squares (PLS) regression is a multivariate statistical technique used for modeling the relationships between predictor variables (X) and a response variable (Y).
It is particularly useful when dealing with datasets that have high dimensionality, multicollinearity, or noisy variables. PLS aims to find a set of latent variables, called components, that capture the most important information from both X and Y.
Mathematics of PLS Regression:
Let’s consider a scenario with N observations and p predictor variables (attributes) in X, and a single response variable Y. PLS constructs a set of orthogonal components, which are linear combinations of the original predictor variables:
- Weights Calculation: PLS starts by finding a weight vector wk that maximizes the covariance between X and Y. This is achieved through iterations of weight updates.
- Scores Calculation: Once wk is determined, the scores tk are calculated by projecting X onto wk.
- Residual Calculation: The residuals Ek (error) between Y and the scores tk are calculated.
- Loading Vector Calculation: The loading vector ck is calculated by regressing…