XGBoost, or Extreme Gradient Boosting, is an ensemble learning algorithm primarily based on gradient boosting and optimization principles. It builds a strong predictive model by combining the predictions of multiple weak learners, often decision trees, through an iterative process.
1. HOW IT WORKS
Here’s a concise technical breakdown of how XGBoost works:
- Gradient Boosting: XGBoost follows a boosting approach where each new model corrects the errors of the previous ones, leading to incremental performance improvement.
- Loss Function: It minimizes a loss function that quantifies the disparity between predicted and actual values, using common loss functions such as mean squared error (for regression) and log loss (for classification).
- Gradient Descent: XGBoost employs gradient descent to minimize the loss function. It calculates the gradient of the loss concerning the current model’s predictions.
- Additive Learning: At each boosting iteration, a new decision tree (weak learner) is added to the ensemble. This tree aims to minimize the residual errors left by the previous trees.
- Weighted Updates: XGBoost assigns weights to data points, giving higher weights to those that are harder to predict (higher residual errors). This focuses the next model on rectifying these errors.
- Regularization: To prevent overfitting, XGBoost incorporates regularization terms (L1 and L2) that penalize complex models, encouraging simplicity.
- Learning Rate: It introduces a “learning rate” parameter controlling the step size of each iteration. A smaller rate slows learning, enabling finer adjustments.
- Feature Importance: XGBoost calculates feature importance scores by evaluating each feature’s contribution to reducing the loss function across all trees.
- Stopping Criteria: Training stops when predefined criteria are met, like a set number of trees or negligible loss improvement.
- Prediction: To make predictions, XGBoost combines the weak learners’ predictions, each scaled by a “shrinkage” factor (learning rate).
2. PROS
- High Accuracy: XGBoost…