Schematic illustration of partial least squares regression. Two blocks of data, X and Y, are projected by w and c onto latent components, t and u, and least squares regression is performed. p and q represent loading vectors. https://doi.org/10.1371/journal.pone.0179638.g001

Partial Least Squares (PLS): Taming High-Dimensional Data and Capturing Complex Relationships

Rahul S

--

Partial Least Squares (PLS) regression is a multivariate statistical technique used for modeling the relationships between predictor variables (X) and a response variable (Y).

It is particularly useful when dealing with datasets that have high dimensionality, multicollinearity, or noisy variables. PLS aims to find a set of latent variables, called components, that capture the most important information from both X and Y.

Mathematics of PLS Regression:

Let’s consider a scenario with N observations and p predictor variables (attributes) in X, and a single response variable Y. PLS constructs a set of orthogonal components, which are linear combinations of the original predictor variables:

  1. Weights Calculation: PLS starts by finding a weight vector wk​ that maximizes the covariance between X and Y. This is achieved through iterations of weight updates.
  2. Scores Calculation: Once wk​ is determined, the scores tk​ are calculated by projecting X onto wk​.
  3. Residual Calculation: The residuals Ek​ (error) between Y and the scores tk​ are calculated.
  4. Loading Vector Calculation: The loading vector ck​ is calculated by regressing…

--

--