XGBoost Simply Explained (With an Example in Python)

This article will guide you through the nuances of the XGBoost algorithm, and how to use the XGBoost framework.

xgboost explainer python springboard

Boosting, especially of decision trees, is among the most prevalent and powerful machine learning algorithms.

There are many variants of boosting algorithms and frameworks implementing those algorithms. XGBoost—short for the exciting moniker extreme gradient boosting—is one of the most well-known algorithms with an accompanying, and even more popular, framework.

This article will guide you through the nuances of XGBoost (the algorithm) and how to use XGBoost (the framework).

*Looking for the Colab Notebook for this post? Find it right here.*

What is XGBoost?

The term “XGBoost” can refer to both a gradient boosting algorithm for decision trees and an open-source framework implementing that algorithm.

To disambiguate between the two meanings of XGBoost, we'll call the algorithm "XGBoost the Algorithm" and the framework "XGBoost the Framework."

To understand XGBoost the Framework, we first have to understand XGBoost the Algorithm.

What is XGBoost the Algorithm?

As the name may reveal, XGBoost the Algorithm is a gradient boosting algorithm, a common technique in ensemble learning.

To unpack that new phrase, ensemble learning is a type of machine learning that enlists many models to make predictions together. Boosting algorithms are distinguished from other ensemble learning techniques by building a sequence of initially weak models into increasingly more powerful models. Gradient boosting algorithms choose how to build a more powerful model using the gradient of a loss function that captures the performance of a model.

Gradient boosting is a foundational approach to many machine learning algorithms. XGBoost has solidified its name in the boosting game with its use in many competition-winning models and prolific reference in research.

How does XGBoost the Algorithm work?

XGBoost the Algorithm operates on decision trees, models that construct a graph that examines the input under various "if" statements (vertices in the graph). Whether the "if" condition is satisfied influences the next "if" condition and eventual prediction. XGBoost the Algorithm progressively adds more and more "if" conditions to the decision tree to build a stronger model.

xgboost explainer springboard

How does XGBoost the Algorithm work?

XGBoost the Framework implements XGBoost the Algorithm and other generic gradient boosting techniques for decision trees.

XGBoost the Framework is maintained by open-source contributors—it’s available in Python, R, Java, Ruby, Swift, Julia, C, and C++ along with other community-built, non-official support in many other languages.

XGBoost the Algorithm was first published by University of Washington researchers in 2016 as a novel gradient boosting algorithm. Like other gradient boosting algorithms on decision trees, XGBoost considers the leaves of the current decision tree and questions whether turning that leaf into a new “if” statement with separate predictions would benefit the model. The benefit to the model depends on the “if” statement chosen and which leaf it’s placed on—this can be determined using the gradient of the loss. The loss includes a scoring function that measures algorithm performance.

What sets XGBoost the Algorithm apart?

XGBoost the Algorithm sets itself apart from other gradient boosting techniques by using a second-order approximation of the scoring function. This approximation allows XGBoost to calculate the optimal “if” condition and its impact on performance.  XGBoost The Algorithm can then store these in its memory the next decision tree to save recomputing it.

XGBoost the Algorithm is powerful on its own but is also a great fixer-upper using the other tools from your machine learning toolbox. Consider feature engineering for instance, where the machine learning engineer preprocesses the raw inputs into new input features before letting the model get its hands dirty. XGBoost the Algorithm makes the most of engineered features and can produce a nicely interpretable and high performing model.

How do you use XGBoost?

Together, XGBoost the Algorithm and XGBoost the Framework form a great pairing with many uses.

  • XGBoost the Algorithm learns a model faster than many other machine learning models and works well on categorical data and limited datasets.
  • XGBoost the Framework is highly efficient and developer-friendly with lots of documentation and online support.

These advantages make XGBoost (both the algorithm and the framework) useful for many machine learning applications.

If you'd like to learn more and begin using XGBoost like a professional, check out Springboard's Machine Learning Career Track.

Is machine learning engineering the right career for you?

Knowing machine learning and deep learning concepts is important—but not enough to get you hired. According to hiring managers, most job seekers lack the engineering skills to perform the job. This is why more than 50% of Springboard's Machine Learning Career Track curriculum is focused on production engineering skills. In this course, you'll design a machine learning/deep learning system, build a prototype, and deploy a running application that can be accessed via API or web service. No other bootcamp does this.

Our machine learning training will teach you linear and logistical regression, anomaly detection, cleaning, and transforming data. We’ll also teach you the most in-demand ML models and algorithms you’ll need to know to succeed. For each model, you will learn how it works conceptually first, then the applied mathematics necessary to implement it, and finally learn to test and train them.

Find out if you're eligible for Springboard's Machine Learning Career Track.

Ready to learn more?

Browse our Career Tracks and find the perfect fit