{"id":14646,"date":"2020-09-28T02:38:53","date_gmt":"2020-09-28T09:38:53","guid":{"rendered":"https:\/\/www.springboard.com\/?p=14646"},"modified":"2023-09-28T00:23:11","modified_gmt":"2023-09-28T07:23:11","slug":"xgboost-random-forest-catboost-lightgbm","status":"publish","type":"post","link":"https:\/\/www.springboard.com\/blog\/data-science\/xgboost-random-forest-catboost-lightgbm\/","title":{"rendered":"XGBoost vs. CatBoost vs. LightGBM: How Do They Compare?"},"content":{"rendered":"\n<p>Random forests and decision trees are tools that every machine learning engineer wants in their toolbox.<\/p>\n\n\n\n<p>Think of a carpenter. When a carpenter is considering a new tool, they examine a variety of brands\u2014similarly, we\u2019ll analyze some of the most popular boosting techniques and frameworks so you can choose the best tool for the job.<\/p>\n\n\n\n<p>This article will guide you through decision trees and random forests in machine learning, and compare LightGBM vs. XGBoost vs. CatBoost.<\/p>\n\n\n\n<p>*Looking for the <strong>Colab Notebook<\/strong> for this post? Find it right here.*<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What is a decision tree in machine learning?<\/h2>\n\n\n\n<p>If you are an aspiring <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/what-does-a-data-scientist-do\/\" target=\"_blank\" data-type=\"post\" data-id=\"24427\" rel=\"noreferrer noopener\">data scientist<\/a> and involving&nbsp;with machine learning,&nbsp;decision trees may help you produce clearly interpretable results and choose the best feasible option. Let\u2019s start by explaining decision trees. Let\u2019s start by explaining decision trees. Decision trees are a class of machine learning models that can be thought of as a sequence of \u201cif\u201d statements to apply to an input to determine the prediction.<\/p>\n\n\n\n<p>In greater rigor, a decision tree incrementally constructs vertices within a tree that represent a certain \u201cif\u201d statement and has children vertices connected to the parent by edges representing the possible outcomes of the parent vertex if condition (in decision tree lingo, this is referred to as the cut).<\/p>\n\n\n\n<p>Eventually, after some sequence of \u201cif\u201d statements, a tree vertice will have no children but hold a prediction value instead. Decision trees can learn the \u201cif\u201d conditions and eventual prediction, but they notoriously overfit the training data. To prevent overfitting, oftentimes decision trees are purposefully underfit and cleverly combined to reach the right balance of bias and variance.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.springboard.com\/library\/static\/3b71ee0fb317d36e7e20f7a2b7c390dd\/c0388\/screen-shot-2020-12-01-at-3.24.06-pm.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/www.springboard.com\/library\/static\/3b71ee0fb317d36e7e20f7a2b7c390dd\/c0388\/screen-shot-2020-12-01-at-3.24.06-pm.png\" alt=\"decision tree machine learning\" title=\"decision tree machine learning\"\/><\/a><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">What are random forests in machine learning?<\/h2>\n\n\n\n<p>Now we\u2019ll explore random forests, the brainchild of <a href=\"https:\/\/www.stat.berkeley.edu\/~breiman\/randomforest2001.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Leo Breiman<\/a>. Random forests are a type of ensemble learning or a collection of so-called \u201cweak learner\u201d models whose predictions are combined into a single prediction.<\/p>\n\n\n\n<p>In the case of random forests, the collection is made up of many decision trees. Random forests are considered \u201crandom\u201d because each tree is trained using a random subset of the training data (referred to as bagging in more general ensemble models), and random subsets of the input features (coined feature bagging in ensemble model speak), to obtain diverse trees.<\/p>\n\n\n\n<p>Bagging decreases the high variance and tendency of a weak learner model to overfit a dataset. For random forests, both types of bagging are necessary. Without both types of bagging, many of the trees could create similar \u201cif\u201d conditions and essentially highly correlated trees.<\/p>\n\n\n\n<p>Instead of bagging and creating many weak learner models to prevent overfitting, often, an ensemble model may use a so-called boosting technique to train a strong learner using a sequence of weaker learners.<\/p>\n\n\n\n<p>In the case of decision trees, the weaker learners are underfit trees that are strengthened by increasing the number of \u201cif\u201d conditions in each subsequent model.<\/p>\n\n\n\n<p><a href=\"https:\/\/xgboost.readthedocs.io\/en\/latest\/\" target=\"_blank\" rel=\"noreferrer noopener\">XGBoost<\/a>, <a href=\"https:\/\/catboost.ai\/\" target=\"_blank\" rel=\"noreferrer noopener\">CatBoost<\/a>, and <a href=\"https:\/\/lightgbm.readthedocs.io\/en\/latest\/\" target=\"_blank\" rel=\"noreferrer noopener\">LightGBM<\/a> have emerged as the most optimized boosting techniques for gradient-boosted tree algorithms. Scikit-learn also has generic <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/classes.html#module-sklearn.ensemble\" target=\"_blank\" rel=\"noreferrer noopener\">implementations<\/a> of random forests and gradient-boosted tree algorithms, but with fewer optimizations and customization options than XGBoost, CatBoost, or LightGBM, and is often better suited for research than production environments.<\/p>\n\n\n\n<p>Each of XGBoost, CatBoost, and LightGBM have their own frameworks, distinguished by how the decision tree cuts are added iteratively.<\/p>\n\n\n<div class=\"bg-leaf-50 p-4 my-3\"><h4 class=\"fw-bold text-center\">Get To Know Other\tData Science Students<\/h4><div class=\"row row-cols-1 row-cols-lg-3\"><div class=\"col\"><div class=\"card success-story-card h-100 d-flex justify-content-between mb-0\"><div class=\"flex-grow-1 text-center\"><a class=\"d-inline-block rounded-circle\" href=\"\/success\/nick-lenczewski\" style=\"width:125px;height:125px;overflow:hidden\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/res.cloudinary.com\/springboard-images\/image\/upload\/v1667235351\/Student%20Success\/Nick_Lenczewski.jpg\" alt=\"Nick Lenczewski\" style=\"object-fit:contain;max-width:170px;height:125px\" \/><\/a><p class=\"fw-bold mb-0\">Nick Lenczewski<\/p><p class=\"text-muted lh-1\">Data Scientist at Ovative Group<\/p><\/div><div class=\"w-100 d-block d-md-none mt-3\"><\/div><p class=\"mb-0 mx-auto text-center\"><a class=\"btn btn-primary mx-auto\" href=\"\/success\/nick-lenczewski\">Read Story<\/a><\/p><\/div><\/div><div class=\"col d-none d-md-block\"><div class=\"card success-story-card h-100 d-flex justify-content-between mb-0\"><div class=\"flex-grow-1 text-center\"><a class=\"d-inline-block rounded-circle\" href=\"\/success\/garrick-chu\" style=\"width:125px;height:125px;overflow:hidden\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/res.cloudinary.com\/springboard-images\/image\/upload\/v1629203194\/Student%20Success\/Garrick_Chu_125x125.png\" alt=\"Garrick Chu\" style=\"object-fit:contain;max-width:170px;height:125px\" \/><\/a><p class=\"fw-bold mb-0\">Garrick Chu<\/p><p class=\"text-muted lh-1\">Contract Data Engineer at Meta<\/p><\/div><p class=\"mb-0 mx-auto text-center\"><a class=\"btn btn-primary mx-auto\" href=\"\/success\/garrick-chu\">Read Story<\/a><\/p><\/div><\/div><div class=\"col d-none d-md-block\"><div class=\"card success-story-card h-100 d-flex justify-content-between mb-0\"><div class=\"flex-grow-1 text-center\"><a class=\"d-inline-block rounded-circle\" href=\"\/success\/rane-najera-wynne\" style=\"width:125px;height:125px;overflow:hidden\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/res.cloudinary.com\/springboard-images\/image\/upload\/v1659153158\/Student%20Success\/Rane_Najera_Wynne.jpg\" alt=\"Rane Najera-Wynne\" style=\"object-fit:contain;max-width:170px;height:125px\" \/><\/a><p class=\"fw-bold mb-0\">Rane Najera-Wynne<\/p><p class=\"text-muted lh-1\">Data Steward\/data Analyst at BRIDGE<\/p><\/div><p class=\"mb-0 mx-auto text-center\"><a class=\"btn btn-primary mx-auto\" href=\"\/success\/rane-najera-wynne\">Read Story<\/a><\/p><\/div><\/div><\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">LightGBM vs. XGBoost vs. CatBoost<\/h2>\n\n\n\n<p>LightGBM is a boosting technique and framework developed by Microsoft. The framework implements the <a href=\"https:\/\/papers.nips.cc\/paper\/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">LightGBM algorithm<\/a> and is available in Python, R, and C. LightGBM is unique in that it can construct trees using Gradient-Based One-Sided Sampling, or GOSS for short.<\/p>\n\n\n\n<p>GOSS looks at the gradients of different cuts affecting a loss function and updates an underfit tree according to a selection of the largest gradients and randomly sampled small gradients. GOSS allows LightGBM to quickly find the most influential cuts.<\/p>\n\n\n\n<p>XGBoost was originally produced by <a href=\"https:\/\/arxiv.org\/pdf\/1603.02754.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">University of Washington researchers<\/a> and is maintained by open-source contributors. XGBoost is available in Python, R, Java, Ruby, Swift, Julia, C, and C++. Similar to LightGBM, XGBoost uses the gradients of different cuts to select the next cut, but XGBoost also uses the hessian, or second derivative, in its ranking of cuts. Computing this next derivative comes at a slight cost, but it also allows a greater estimation of the cut to use.<\/p>\n\n\n\n<p>Finally, CatBoost is developed and maintained by the Russian search engine <a href=\"http:\/\/yandex.com\" target=\"_blank\" rel=\"noreferrer noopener\">Yandex<\/a> and is available in Python, R, C++, Java, and also Rust. CatBoost distinguishes itself from LightGBM and XGBoost by focusing on <a href=\"https:\/\/arxiv.org\/pdf\/1810.11363.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">optimizing decision trees for categorical variables<\/a>, or variables whose different values may have no relation with each other (eg. apples and oranges).<\/p>\n\n\n\n<p>To compare apples and oranges in XGBoost, you\u2019d have to split them into two one-hot encoded variables representing \u201cis apple\u201d and \u201cis orange,\u201d but CatBoost determines different categories automatically with no need for preprocessing (LightGBM does support categories, but has more limitations than CatBoost).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">LightGBM vs. XGBoost vs. CatBoost: Which is better?<\/h2>\n\n\n\n<p>All of LightGBM, XGBoost, and CatBoost have the ability to execute on either CPUs or GPUs for accelerated learning, but their comparisons are more nuanced in practice. Each framework has an extensive list of tunable hyperparameters that affect learning and eventual performance.<\/p>\n\n\n\n<p>First off, CatBoost is designed for categorical data and is known to have the best performance on it, showing the state-of-the-art performance over XGBoost and LightGBM in eight datasets in its official journal article. As of CatBoost version 0.6, a trained CatBoost tree can predict <a href=\"https:\/\/catboost.ai\/news\/best-in-class-inference-and-a-ton-of-speedups\" target=\"_blank\" rel=\"noreferrer noopener\">extraordinarily faster<\/a> than either XGBoost or LightGBM.<\/p>\n\n\n\n<p>On the flip side, some of CatBoost\u2019s internal identification of categorical data slows its training time significantly in comparison to XGBoost, but it is still reported much faster than XGBoost. LightGBM also boasts accuracy and training speed increases over XGBoost in five of the benchmarks examined in its original publication.<\/p>\n\n\n\n<p>But to XGBoost\u2019s credit, XGBoost has been around the block longer than either LightGBM and CatBoost, so it has better learning resources and a more active developer community. The distributed Gradient Boosting library uses parallel tree boosting to solve numerous <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/data-science-definition\/\" target=\"_blank\" data-type=\"URL\" data-id=\"https:\/\/www.springboard.com\/blog\/data-science\/data-science-definition\/\" rel=\"noreferrer noopener\">data science<\/a> problems quickly and accurately. It also doesn\u2019t hurt that XGBoost is substantially faster and more accurate than its predecessors and other competitors such as Scikit-learn.<\/p>\n\n\n\n<p>Each boosting technique and framework has a time and a place\u2014and it is often not clear which will perform best until testing them all. Fortunately, <a href=\"https:\/\/arxiv.org\/pdf\/1809.04559.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">prior work<\/a> has done a decent amount of benchmarking the three choices, but ultimately it\u2019s up to you, the engineer, to determine the best tool for the job.<\/p>\n\n\n\n<p class=\"rm has-background\" style=\"background-color:#efeff6\"><strong>Since you\u2019re here\u2026<br><\/strong>Curious about a career in data science? Experiment with our <a rel=\"noreferrer noopener\" href=\"https:\/\/www.springboard.com\/resources\/guides\/data-science-process\/\" target=\"_blank\">free data science learning path<\/a>, or join our <a rel=\"noreferrer noopener\" href=\"https:\/\/www.springboard.com\/courses\/data-science-career-track\/\" target=\"_blank\">Data Science Bootcamp<\/a>, where you\u2019ll get your tuition back if you don&#8217;t land a job after graduating. We\u2019re confident because our courses work \u2013 check out our <a rel=\"noreferrer noopener\" href=\"https:\/\/www.springboard.com\/success\/\" target=\"_blank\">student success stories<\/a> to get inspired.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Random forests and decision trees are tools that every machine learning engineer wants in their toolbox. Think of a carpenter. When a carpenter is considering a new tool, they examine a variety of brands\u2014similarly, we\u2019ll analyze some of the most popular boosting techniques and frameworks so you can choose the best tool for the job. [&hellip;]<\/p>\n","protected":false},"author":100,"featured_media":19003,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_eb_attr":"","_eb_data_table":"","footnotes":""},"categories":[67],"tags":[],"marketing_tags":[1466],"class_list":{"0":"post-14646","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-data-science"},"acf":[],"_links":{"self":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts\/14646"}],"collection":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/users\/100"}],"replies":[{"embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/comments?post=14646"}],"version-history":[{"count":4,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts\/14646\/revisions"}],"predecessor-version":[{"id":50111,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts\/14646\/revisions\/50111"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/media\/19003"}],"wp:attachment":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/media?parent=14646"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/categories?post=14646"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/tags?post=14646"},{"taxonomy":"marketing_tags","embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/marketing_tags?post=14646"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}