 Schematic illustration of partial least squares regression. Two blocks of data, X and Y, are projected by w and c onto latent components, t and u, and least squares regression is performed. p and q represent loading vectors. https://doi.org/10.1371/journal.pone.0179638.g001

# Partial Least Squares (PLS): Taming High-Dimensional Data and Capturing Complex Relationships

Partial Least Squares (PLS) regression is a multivariate statistical technique used for modeling the relationships between predictor variables (X) and a response variable (Y).

It is particularly useful when dealing with datasets that have high dimensionality, multicollinearity, or noisy variables. PLS aims to find a set of latent variables, called components, that capture the most important information from both X and Y.

## Mathematics of PLS Regression:

Let’s consider a scenario with N observations and p predictor variables (attributes) in X, and a single response variable Y. PLS constructs a set of orthogonal components, which are linear combinations of the original predictor variables:

1. Weights Calculation: PLS starts by finding a weight vector wk​ that maximizes the covariance between X and Y. This is achieved through iterations of weight updates.
2. Scores Calculation: Once wk​ is determined, the scores tk​ are calculated by projecting X onto wk​.
3. Residual Calculation: The residuals Ek​ (error) between Y and the scores tk​ are calculated.