Descriptive vs Inferential Statistics in Data Science

“Torture numbers, and they will confess to anything”- A famous quote from Gregg Easterbrook. It’s true, but according to you, what should be the methodology of this torture? There is a lot of debate on this. One such age-old debate is Descriptive vs Inferential Statistics. These two, Descriptive and Inferential Statistics, are the major divisions of the field of statistics. In most research conducted on groups of individuals; Descriptive vs Inferential Statistics argument seems redundant. It is because, typically, in order to make a full analysis of the dataset and draw conclusions, they find themselves using Descriptive and Inferential Statistics both. Although some of the statistical measures are similar in both, their modus operandi and goals are very diversified.

Here’s a list of topics this blog covers:

What is Statistics
What is Descriptive Statistics
What is Inferential Statistics
Key differences between Descriptive and Inferential Statistics

Descriptive vs Inferential Statistics: How They Differ?

Before we go on to understand the differences between descriptive and inferential statistics, it is very important to understand what is statistics, what is descriptive statistics and what is inferential statistics.

What is Statistics?

Statistics is a mathematical body of science that concerns the collection, analysis, interpretation, and presentation of the data. For instance, a basic visualisation like Pie Chart might give us some high-level information, but with statistics, we get to operate on data in a much more information-driven and systematic manner. The mathematics involved helps gain deeper insight into the structure of the data and based on that structure, it also helps us to optimally apply other data science techniques to derive more information out of that data.

Although statistics can often be the most intimidating aspect of data science for aspiring data scientists to learn, the power of data science comes from a deep understanding of statistics itself. Let’s move ahead and discuss some basic terminology one should be aware of while dealing with statistics.

Population: It is said to be an aggregate observation of all elements under study grouped together by one or more common features.
Sample: It is an unbiased number of observations taken from a population.

population and sample group — Source: Research methodology

Variable: A variable, which may also be called a data item, is any characteristic, number, or quantity that can be measured or counted.

What is Descriptive Statistics?

Descriptive Statistics is a major division of statistics that helps to describe a big chunk of data with summary charts and tables. It neither allows us to draw conclusions about the population nor reach a conclusion regarding any of our hypotheses. Descriptive Statistics, by simply describing our collected dataset, enables us to present raw data in a more significant way.

For example, imagine if we had the results of 50 pieces of students’ coursework and we want to check the overall performance of these students. Descriptive Statistics plays a significant role here and allows us to find the distribution of marks such as a smaller number of students score high and low marks, and many students score average marks.

Types of Descriptive Statistics

Typically, there are following two types of Descriptive Statistics that people tend to use when they’re describing their data:

Measures of Central Tendency: It is a summary statistic that represents the central or typical value of a frequency distribution (FD). In short, these measures indicate where most values in a distribution fall. Arithmetic Mean, Median, and Mode are the most common measures of central tendency.

Mean: It is the numerical average of all values.
Median: It is directly in the middle of the dataset.
Mode: It represents the most frequent value in the dataset.

In our example, FD is simply the distribution of the marks scored by 50 students.

2. Measures of Spread: It may be defined as the descriptive statistics that describe how identical or diversified the set of observed values are for a data item.

Let’s have a look at the scores of 50 students again, their median marks maybe 72. However, not all students will have scored 72 marks, measures of Spread look at their marks and evaluate how many students get more than 72, and how many students get marks in between 0 and 72.

Range, Absolute Deviation (AD), Variance, Quartiles, and Standard Deviation (SD) are some of the statistics in which we can describe this spread.

Get To Know Other Data Science Students

Pizon Shetu

Data Scientist at Whiterock AI

Read Story

Melanie Hanna

Data Scientist at Farmer's Fridge

Read Story

Mikiko Bazeley

ML Engineer at MailChimp

Read Story

What is Inferential Statistics?

Above we explore Descriptive Statistics with an example regarding the results of 50 pieces of students’ coursework. This example was an analysis of the entire population.

In contrast, rather than having access to the whole population, we often have a limited number of data.

In such cases, Inferential Statistics come into play. For instance, we might be interested to find the average of the entire school’s exam marks. It is not feasible because we might find it impossible to get the data we want. So, instead of getting the entire school’s exam marks, we measure a smaller sample of students (for example, a sample of 100 students). This sample of 100 students will now represent the entire population of all students of that school.

In a nutshell, Inferential Statistics make predictions about a population based on a sample of data taken from that population.

The technique of Inferential Statistics involves following steps:

First, take some samples and try to find one that represents the entire population accurately.
Next, test the sample and use it to draw generalizations about the whole population.

Descriptive vs Inferential Statistics: Key Differences

Descriptive Statistics	Inferential Statistics
It is concerned with describing the population under study. Sampling is not required.	It focuses on drawing conclusions about the populations, based on sample analysis.
Collects, organizes, analyzes and presents the data in a meaningful way.	Compares data, test hypotheses and make predictions of the future outcome.
The form of result is charts, Graphs, and tables.	The result is displayed in the form of probability.
It describes a situation.	It explains the likelihood of the occurrence of an event.
It explains the data (already known) to summarize sample.	It attempts to reach the conclusions to learn about the population; that extends beyond the data available.

Both, Descriptive and Inferential Statistics methods are equally critical to advancements across scientific fields like data science. That’s why it becomes extremely important for statisticians and data scientists to understand that both methods have their own advantages and limitations, and in the Descriptive vs Inferential Statistics debate, strictly choosing one over another would be a waste of time.

Since you’re here…
Curious about a career in data science? Experiment with our free data science learning path, or join our Data Science Bootcamp, where you’ll get your tuition back if you don’t land a job after graduating. We’re confident because our courses work – check out our student success stories to get inspired.

Descriptive vs Inferential Statistics in Data Science

Ready to launch your career?