{"id":13200,"date":"2021-10-28T23:33:40","date_gmt":"2021-10-29T06:33:40","guid":{"rendered":"https:\/\/www.springboard.com\/?p=13200"},"modified":"2023-09-28T00:46:14","modified_gmt":"2023-09-28T07:46:14","slug":"what-is-logistic-regression","status":"publish","type":"post","link":"https:\/\/www.springboard.com\/blog\/data-science\/what-is-logistic-regression\/","title":{"rendered":"What is Logistic Regression? A Guide to the Formula &#038; Equation"},"content":{"rendered":"\n<p>As an aspiring data analyst\/data scientist, you would have heard of algorithms that help classify, predict &amp; cluster information. Linear regression is one of the most common machine learning algorithms that is used in solving data problems, followed by the infamous logistic regression. <\/p>\n\n\n\n<p>Why infamous? Because it disguises as a regression algorithm, just like linear regression. But don\u2019t get confused, logistic regression is, in fact, a classification algorithm. It is used to estimate discrete values (binary values like 0\/1, yes\/no, true\/false) based on a given set of independent variable(s). In simple words, logistic regression predicts the probability of the occurrence of an event by fitting data to a logit function (hence the name LOGIsTic regression). Logistic regression predicts probability, hence its output values lie between 0 and 1.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"318\" height=\"159\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/10\/logistic-regression-model.png\" alt=\"logistic regression model\" class=\"wp-image-46718\"\/><figcaption class=\"wp-element-caption\">Source: <a href=\"https:\/\/towardsdatascience.com\/\" data-type=\"URL\" data-id=\"https:\/\/towardsdatascience.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Towards Data Science<\/a><\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">What is Logistic Regression: Base Behind The Logistic Regression Formula<\/h2>\n\n\n\n<p>Logistic regression is named for the function used at the core of the method, the logistic function. The logistic function or the sigmoid function is an S-shaped curve that can take any real-valued number and map it into a value between 0 and 1, but never exactly at those limits.<\/p>\n\n\n\n<p><strong>1 \/ (1 + e^-value)<\/strong><\/p>\n\n\n\n<p>Where :<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u2018e\u2019 is the base of natural logarithms <\/li>\n\n\n\n<li>\u2018value\u2019 is the actual numerical value that you want to transform<\/li>\n<\/ul>\n\n\n\n<p><strong>Did You Know?<br><\/strong>Numbers within a certain range can be transformed into a 0 to 1 range using a logistic\/sigmoid function. For instance, we applied the logistic function between a range of -10 to + 10, and this is what our graph looks like: <\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"593\" height=\"371\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/10\/logistic-regression-graph.png\" alt=\"logistic regression graph\" class=\"wp-image-46720\" srcset=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/10\/logistic-regression-graph.png 593w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/10\/logistic-regression-graph-400x250.png 400w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/10\/logistic-regression-graph-380x238.png 380w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2021\/10\/logistic-regression-graph-380x238.png 420w\" sizes=\"(max-width: 593px) 100vw, 593px\" \/><figcaption class=\"wp-element-caption\">This image is created after implementing the code Python<br><\/figcaption><\/figure>\n\n\n\n<p>Now that we know what the logistic function is, let\u2019s see how it is used in logistic regression.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>What Does the Equation Look Like?<\/strong><\/h2>\n\n\n\n<p>Logistic regression uses an equation as the representation which is very much like the equation for linear regression. In the equation, input values are combined linearly using weights or coefficient values to predict an output value. A key difference from linear regression is that the output value being modeled is a binary value (0 or 1) rather than a numeric value. Here is an example of a logistic regression equation:<\/p>\n\n\n\n<p><strong>y = e^(b0 + b1*x) \/ (1 + e^(b0 + b1*x))<\/strong><\/p>\n\n\n\n<p>Where:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>x is the input value<\/li>\n\n\n\n<li>y is the predicted output<\/li>\n\n\n\n<li>b0 is the bias or intercept term<\/li>\n\n\n\n<li>b1 is the coefficient for the single input value (x)<\/li>\n<\/ul>\n\n\n\n<p>In the equation, each column in your input data has an associated b coefficient (a constant real value) that must be learned from your training data.<\/p>\n\n\n\n<p><strong>Did You Know?<\/strong><strong><br><\/strong>We can also transform this equation to:<\/p>\n\n\n\n<p><strong>ln(y \/ 1 \u2013 y) = b0 + b1 * X<\/strong><br><br>Because \u2018e\u2019 from one side can be removed by adding a natural logarithm (ln) to the other. If you observe closely, it looks like the calculation of the output on the right is like linear regression, and the input on the left is a log of the probability of the default class.<\/p>\n\n\n\n<p><strong>Properties of the Logistic Regression Equation:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The dependent variable in logistic regression follows Bernoulli distribution<\/li>\n\n\n\n<li>Estimation is done through maximum likelihood<\/li>\n\n\n\n<li>No R Square, Model fitness is calculated through a concordance, KS-Statistics<\/li>\n<\/ul>\n\n\n\n<p><strong>When Implementing the Logistic Regression Model<\/strong><\/p>\n\n\n\n<p>The coefficients (Beta values b) of the logistic regression algorithm must be estimated from your training data using maximum-likelihood estimation. The best Beta values would result in a model that would predict a value very close to 1 for the default class and value very close to 0.<\/p>\n\n\n<div class=\"bg-leaf-50 p-4 my-3\"><h4 class=\"fw-bold text-center\">Get To Know Other\tData Science Students<\/h4><div class=\"row row-cols-1 row-cols-lg-3\"><div class=\"col\"><div class=\"card success-story-card h-100 d-flex justify-content-between mb-0\"><div class=\"flex-grow-1 text-center\"><a class=\"d-inline-block rounded-circle\" href=\"\/success\/mikiko-bazeley\" style=\"width:125px;height:125px;overflow:hidden\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/res.cloudinary.com\/springboard-images\/image\/upload\/v1629203192\/Student%20Success\/Mikiko_Bazeley_125x125.png\" alt=\"Mikiko Bazeley\" style=\"object-fit:contain;max-width:170px;height:125px\" \/><\/a><p class=\"fw-bold mb-0\">Mikiko Bazeley<\/p><p class=\"text-muted lh-1\">ML Engineer at MailChimp<\/p><\/div><div class=\"w-100 d-block d-md-none mt-3\"><\/div><p class=\"mb-0 mx-auto text-center\"><a class=\"btn btn-primary mx-auto\" href=\"\/success\/mikiko-bazeley\">Read Story<\/a><\/p><\/div><\/div><div class=\"col d-none d-md-block\"><div class=\"card success-story-card h-100 d-flex justify-content-between mb-0\"><div class=\"flex-grow-1 text-center\"><a class=\"d-inline-block rounded-circle\" href=\"\/success\/aaron-pujanandez\" style=\"width:125px;height:125px;overflow:hidden\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/res.cloudinary.com\/springboard-images\/image\/upload\/v1629203192\/Student%20Success\/Aaron_Pujanandez_125x125.png\" alt=\"Aaron Pujanandez\" style=\"object-fit:contain;max-width:170px;height:125px\" \/><\/a><p class=\"fw-bold mb-0\">Aaron Pujanandez<\/p><p class=\"text-muted lh-1\">Dir. Of Data Science And Analytics at Deep Labs<\/p><\/div><p class=\"mb-0 mx-auto text-center\"><a class=\"btn btn-primary mx-auto\" href=\"\/success\/aaron-pujanandez\">Read Story<\/a><\/p><\/div><\/div><div class=\"col d-none d-md-block\"><div class=\"card success-story-card h-100 d-flex justify-content-between mb-0\"><div class=\"flex-grow-1 text-center\"><a class=\"d-inline-block rounded-circle\" href=\"\/success\/lou-zhang\" style=\"width:125px;height:125px;overflow:hidden\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/res.cloudinary.com\/springboard-images\/image\/upload\/v1629225421\/Student%20Success\/Lou_Zhang_icon.png\" alt=\"Lou Zhang\" style=\"object-fit:contain;max-width:170px;height:125px\" \/><\/a><p class=\"fw-bold mb-0\">Lou Zhang<\/p><p class=\"text-muted lh-1\">Data Scientist at MachineMetrics<\/p><\/div><p class=\"mb-0 mx-auto text-center\"><a class=\"btn btn-primary mx-auto\" href=\"\/success\/lou-zhang\">Read Story<\/a><\/p><\/div><\/div><\/div><\/div>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Logistic Regression Assumptions<\/strong><\/h3>\n\n\n\n<p>While logistic regression seems like a fairly simple algorithm to adopt &amp; implement, there are a lot of restrictions around its use. For instance, it can only be applied to large datasets. Similarly, multiple assumptions need to be made in a dataset to be able to apply this machine learning algorithm. <\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The dependent variable has to be binary in a binary logistic equation<\/li>\n\n\n\n<li>The factor level 1 of the dependent variable should represent the desired outcome<\/li>\n\n\n\n<li>Including non-meaningful variables may throw errors. Only include the variables that are necessary and may show a correlation <\/li>\n\n\n\n<li>The model should have little or no multicollinearity &#8211; the independent variables should be absolutely independent of each other<\/li>\n\n\n\n<li>The independent variables are linearly related to the log odds<\/li>\n<\/ul>\n\n\n\n<p>With so many assumptions that need to be made, you may think that the equation is not versatile enough to be implemented across real-life problems but this equation has a lot of applications in the medical field, is wildly popular among data scientists, and is helping people across the world with its superpower.  <\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Where is this Equation Applied in Real Life?<\/strong><\/h4>\n\n\n\n<p>In instances where the binary response is expected\/implied, a Logistic regression equation is commonly used. While it has found the best use case in the field of medicine, the applications are far-reaching.  <\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Determining the probability of having a heart attack<\/strong> &#8211; Medical researchers use data to understand the relationship between the predictor variables to estimate if an individual will have a heart attack or not. The results of the model tell the researchers exactly how changes in exercise and weight (predictor variables) affect the probability that a given individual has a heart attack. A fitted logistic regression model is used here. This offers us a clear picture of the significance of <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/data-science-definition\/\" target=\"_blank\" data-type=\"URL\" data-id=\"https:\/\/www.springboard.com\/blog\/data-science\/data-science-definition\/\" rel=\"noreferrer noopener\">data science<\/a> in the medical profession and the application of the equation.<br><\/li>\n\n\n\n<li><strong>Understanding the possibility of getting admission into a University<\/strong> &#8211; College application aggregators like to determine how variables like CGPA, GMAT &amp; TOEFL score help determine the probability of getting accepted to a particular university. For this, the aggregators perform a logistic regression to understand the relationship between the predictor variables and the probability of getting accepted.<br><\/li>\n\n\n\n<li><strong>Gmail &amp; other inboxes identifying \u2018Spam Emails\u2019 <\/strong>&#8211; One of the most clearly visible examples of how this works, is in the filtering that email inboxes do. Identifying if email communication is promotional\/spam is done by understanding the predictor variables &amp; applying a logistic regression algorithm to check for its authenticity. <\/li>\n<\/ol>\n\n\n\n<p>Now that we have understood the basic math behind logistic regression and how the logit function behaves, along with the steps that we should keep in mind while <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/introduction-regression-classification-machine-learning\/\" target=\"_blank\" data-type=\"URL\" data-id=\"https:\/\/www.springboard.com\/blog\/data-science\/introduction-regression-classification-machine-learning\/\" rel=\"noreferrer noopener\">approaching a dataset<\/a> with logistic regression, as a next step, we will learn how we can implement this algorithm in Python, and how it can generate favourable outcomes for us. We will publish part two of this article in a couple of days.<\/p>\n\n\n\n<p>You can learn this classification technique &amp; many more with Springboard\u2019s <a href=\"https:\/\/www.springboard.com\/courses\/data-analytics-career-track\/\" target=\"_blank\" rel=\"noreferrer noopener\">data analytics<\/a>, <a href=\"https:\/\/www.springboard.com\/courses\/data-science-career-track\/\" target=\"_blank\" rel=\"noreferrer noopener\">data science<\/a>, and <a href=\"https:\/\/www.springboard.com\/courses\/ai-machine-learning-career-track\/\" target=\"_blank\" rel=\"noreferrer noopener\">AI\/machine learning<\/a> career track programs. With 1:1 mentoring and a project-based curriculum that comes with a job guarantee, you can kickstart your career in data-centric world with these specially designed programs.<\/p>\n\n\n\n<p class=\"rm has-background\" style=\"background-color:#efeff6\"><strong>Since you\u2019re here\u2026<\/strong>Are you interested in this career track? Investigate with our free guide to <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/what-does-a-data-scientist-do\/\" data-type=\"post\" data-id=\"24427\">what a data professional <em>actually<\/em> does<\/a>. When you\u2019re ready to build a CV that will make hiring managers melt, join our <a href=\"https:\/\/www.springboard.com\/courses\/data-science-career-track\/\" data-type=\"URL\" data-id=\"https:\/\/www.springboard.com\/courses\/data-science-career-track\/\" target=\"_blank\" rel=\"noreferrer noopener\">Data Science Bootcamp<\/a> which will help you land a job or your tuition back!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As an aspiring data analyst\/data scientist, you would have heard of algorithms that help classify, predict &amp; cluster information. Linear regression is one of the most common machine learning algorithms that is used in solving data problems, followed by the infamous logistic regression. Why infamous? Because it disguises as a regression algorithm, just like linear [&hellip;]<\/p>\n","protected":false},"author":100,"featured_media":7463,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_eb_attr":"","_eb_data_table":"","footnotes":""},"categories":[67],"tags":[],"marketing_tags":[1476],"class_list":{"0":"post-13200","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-data-science"},"acf":[],"_links":{"self":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts\/13200"}],"collection":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/users\/100"}],"replies":[{"embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/comments?post=13200"}],"version-history":[{"count":3,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts\/13200\/revisions"}],"predecessor-version":[{"id":47730,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts\/13200\/revisions\/47730"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/media\/7463"}],"wp:attachment":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/media?parent=13200"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/categories?post=13200"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/tags?post=13200"},{"taxonomy":"marketing_tags","embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/marketing_tags?post=13200"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}