Schematic illustration of partial least squares regression. Two blocks of data, X and Y, are projected by w and c onto latent components, t and u, and least squares regression is performed. p and q represent loading vectors.

Partial Least Squares (PLS): Taming High-Dimensional Data and Capturing Complex Relationships

Rahul S


Partial Least Squares (PLS) regression is a multivariate statistical technique used for modeling the relationships between predictor variables (X) and a response variable (Y).

It is particularly useful when dealing with datasets that have high dimensionality, multicollinearity, or noisy variables. PLS aims to find a set of latent variables, called components, that capture the most important information from both X and Y.

Mathematics of PLS Regression:

Let’s consider a scenario with N observations and p predictor variables (attributes) in X, and a single response variable Y. PLS constructs a set of orthogonal components, which are linear combinations of the original predictor variables:

  1. Weights Calculation: PLS starts by finding a weight vector wk​ that maximizes the covariance between X and Y. This is achieved through iterations of weight updates.
  2. Scores Calculation: Once wk​ is determined, the scores tk​ are calculated by projecting X onto wk​.
  3. Residual Calculation: The residuals Ek​ (error) between Y and the scores tk​ are calculated.
  4. Loading Vector Calculation: The loading vector ck​ is calculated by regressing the residuals Ek​ on X.
  5. Update Weight Vector: The weight vector wk​ is updated by regressing the residuals Ek​ on the loading vector ck​.

The process of calculating components, scores, and residuals is repeated iteratively to obtain additional components.

Advantages of PLS Regression:

  1. Dealing with Collinearity: PLS can effectively handle multicollinearity among predictor variables, making it suitable for high-dimensional datasets.
  2. Noise Reduction: PLS focuses on capturing the common variance between X and Y, which can help in reducing the impact of noisy variables.
  3. Dimensionality Reduction: PLS reduces the dimensionality of the data by creating a smaller set of components, which can improve model interpretability.

Disadvantages of PLS Regression: