In Regularization we add a penalty term to the loss function during model training. This penalty discourages the model from learning overly complex patterns from the training data.
In machine learning, models aim to minimize a loss function that quantifies the error between their predictions and the actual target values in the training data.
Regularization introduces an additional term to the loss function, which is a function of the model’s parameters (weights and biases). This term penalizes large parameter values and complexity in the model.
Two Common Regularization Techniques:
a. L2 Regularization (Ridge Regression):
- L2 regularization adds a penalty term that is proportional to the square of the magnitude of the model’s parameters.
- The regularization term is typically expressed as
λ * Σ(w_i^2)
, whereλ
is the regularization strength andw_i
are the model parameters. - By including this term in the loss function, the model is encouraged to keep the parameter values small, effectively reducing their impact on the predictions.
- This discourages the model from fitting noise.