Scikit-Learn: A Complete Guide With a Logistic Regression Example

In this article, we will focus on logistic regression and its implementation on the MNIST dataset using Scikit-Learn, a free software machine learning library for Python.

scikit-learn ml

Scikit-Learn is a machine learning library that includes many supervised and unsupervised learning algorithms. To date, Scikit-Learn is the first stop for most data scientists and machine learning engineers to build their first machine learning model or set a benchmark for further experiments. This is handy because you don’t always need complex and computationally expensive deep learning algorithms to model your data. 

In this article, we will focus on logistic regression and its implementation on the MNIST dataset using the Scikit-Learn library.

What is Scikit-Learn logistic regression used for?

There are two primary problems in supervised machine learning: regression and classification. Logistic regression (the term logistic regression is a "fake friend" because it does not refer to regression) is a classification algorithm used for classification problems, such as determining whether a tumor is malignant or benign and assessing automotive types.  

In simple terms, logistic regression is the process of finding the best possible plane (decision boundary, Figure 1) that separates classes under consideration. It also assumes that these classes are linearly separable.

screen shot 2020 11 27 at 12 47 19 pm

Figure 1: Sample decision plane in 2D (Source: jeremyjordan.me)

Since linear regression is a fundamental building block of machine learning, we’ll use this concept as a jumping-off point to explain the mathematics of logistic regression.

The main difference between linear regression and logistic regression is the output function. Linear regression uses a linear function that outputs continuous values in any range, whereas logistic regression uses a sigmoid function that limits outputs in the range of zero to one.

  • Sigmoid function or logistic function

Mathematically, the sigmoid function can be described as:

sigmoid function

This limits the value of output in the range of zero to one, as shown in Figure 1.

screen shot 2020 11 27 at 12 24 03 pm

Figure 2: Sigmoid function. (Source: Wikipedia)

  • Hypothesis

For linear regression hypothesis function can be written as:

linear regression hypothesis function

Which is a simple linear function (straight line). This function can be modified for logistic regression as:

 simple linear function

Hence:

screen shot 2020 11 27 at 12 13 55 pm

notations

  • Cost function

In simple terms, the cost function measures the performance of any given machine learning model with respect to data under consideration. This cost function is used to optimize the parameters of the machine learning model after each iteration, during the training phase, to get more accurate predictions. 

The cost function for logistic regression is given by:

screen shot 2020 11 27 at 12 54 16 pm

This can be further simplified to:

screen shot 2020 11 27 at 12 18 39 pm

This cost function is also known as negative log-likelihood loss or cross-entropy loss. 

screen shot 2020 11 27 at 12 19 37 pm

Figure 3: Cost function (Source: Researchgate)

Figure 2 depicts the cost function. When “y” is one and “h” is zero (blue line), the cost function will be high, thus severely penalizing the machine learning model. When “y” is one and “h” is also one (blue line), then the cost function will be zero, meaning no penalty for making correct predictions. Similarly, when “y” is zero and “h” is one (red line), the penalty will be high, whereas when “y” is zero and “h” is also zero, the penalty will be zero. 

screen shot 2020 11 27 at 12 51 04 pm

screen shot 2020 11 27 at 12 25 10 pm

Implementing logistic regression on the MNIST dataset

In this section, we will implement logistic regression on the MNIST dataset. The MNIST dataset is a well-known benchmark dataset in the machine learning community. This dataset consists of pictures of handwritten digits with labels. All images are squares sized 28 x 28 pixels. The label ranges from zero to nine. This is a multinomial logistic regression problem.

By default, Scikit-learn takes care of the implementation, whether it’s a binary or multinomial problem depending on the number of labels present in the dataset.

The code for implementing logistic regression with Scikit-learn on MNIST dataset can be found here. This includes a detailed implementation of the logistic regression model with Scikit-learn.

Is machine learning engineering the right career for you?

Knowing machine learning and deep learning concepts is important—but not enough to get you hired. According to hiring managers, most job seekers lack the engineering skills to perform the job. This is why more than 50% of Springboard's Machine Learning Career Track curriculum is focused on production engineering skills. In this course, you'll design a machine learning/deep learning system, build a prototype, and deploy a running application that can be accessed via API or web service. No other bootcamp does this.

Our machine learning training will teach you linear and logistical regression, anomaly detection, cleaning, and transforming data. We’ll also teach you the most in-demand ML models and algorithms you’ll need to know to succeed. For each model, you will learn how it works conceptually first, then the applied mathematics necessary to implement it, and finally learn to test and train them.

Find out if you're eligible for Springboard's Machine Learning Career Track.

Ready to learn more?

Browse our Career Tracks and find the perfect fit