In Regularization we add a penalty term to the loss function during model training. This penalty discourages the model from learning overly complex patterns from the training data.
In machine learning, models aim to minimize a loss function that quantifies the error between their predictions and the actual target values in the training data.
Regularization introduces an additional term to the loss function, which is a function of the model’s parameters (weights and biases). This term penalizes large parameter values and complexity in the model.
Two Common Regularization Techniques:
a. L2 Regularization (Ridge Regression):
- L2 regularization adds a penalty term that is proportional to the square of the magnitude of the model’s parameters.
- The regularization term is typically expressed as
λ * Σ(w_i^2)
, whereλ
is the regularization strength andw_i
are the model parameters. - By including this term in the loss function, the model is encouraged to keep the parameter values small, effectively reducing their impact on the predictions.
- This discourages the model from fitting noise.
b. L1 Regularization (Lasso Regression):
- L1 regularization adds a penalty term that is proportional to the absolute values of the model’s parameters.
- The regularization term is typically expressed as
λ * Σ|w_i|
. - L1 regularization encourages sparsity in the model, meaning it tends to drive some parameter values to exactly zero. This leads to feature selection, as some features become irrelevant to the model.
When the model is trained, it aims to minimize the combined loss, which includes the data-driven loss (how well it fits the training data) and the regularization term.
The regularization term acts as a constraint that limits the model’s ability to fit the training data too closely.
The balance between fitting the data and minimizing the regularization term is controlled by the regularization strength (λ
). A higher λ
results in stronger regularization and a simpler model.