Machine Learning: XGBoost — An intuition

Rahul S
2 min readSep 4, 2023

XGBoost, or Extreme Gradient Boosting, is an ensemble learning algorithm primarily based on gradient boosting and optimization principles. It builds a strong predictive model by combining the predictions of multiple weak learners, often decision trees, through an iterative process.


Here’s a concise technical breakdown of how XGBoost works:

  1. Gradient Boosting: XGBoost follows a boosting approach where each new model corrects the errors of the previous ones, leading to incremental performance improvement.
  2. Loss Function: It minimizes a loss function that quantifies the disparity between predicted and actual values, using common loss functions such as mean squared error (for regression) and log loss (for classification).
  3. Gradient Descent: XGBoost employs gradient descent to minimize the loss function. It calculates the gradient of the loss concerning the current model’s predictions.
  4. Additive Learning: At each boosting iteration, a new decision tree (weak learner) is added to the ensemble. This tree aims to minimize the residual errors left by the previous trees.
  5. Weighted Updates: XGBoost assigns weights to data points, giving higher weights to those that are harder to predict (higher residual errors). This focuses the next model on rectifying these errors.
  6. Regularization: To prevent overfitting, XGBoost incorporates regularization terms (L1 and L2) that penalize complex models, encouraging simplicity.
  7. Learning Rate: It introduces a “learning rate” parameter controlling the step size of each iteration. A smaller rate slows learning, enabling finer adjustments.
  8. Feature Importance: XGBoost calculates feature importance scores by evaluating each feature’s contribution to reducing the loss function across all trees.
  9. Stopping Criteria: Training stops when predefined criteria are met, like a set number of trees or negligible loss improvement.
  10. Prediction: To make predictions, XGBoost combines the weak learners’ predictions, each scaled by a “shrinkage” factor (learning rate).


  1. High Accuracy: XGBoost…