Back to Blog

Data Science

What Is Reinforcement Learning?

7 minute read | August 9, 2019
Leah Zitter

Written by:
Leah Zitter

Ready to launch your career?

In real life, all of our routine learning, predicting, and decision-making runs on reinforcement learning. It makes sense, then, that scientists build machines using this principle too.

Reinforcement Learning Defined

Here is a simple definition: Think of reinforcement learning as any type of learning that comes about through, and is reinforced by, either positive or negative stimuli. These stimuli either cause you to adopt, retain, or stop a certain habit.

For example, when you mastered the alphabet, you were likely rewarded with hearty compliments from your teacher. You felt good: “Hey, I did it!” Your teacher incrementally rewarded you each small step along the way until you learned how to read. 

That’s reinforced learningwhere you learned something in life, reinforced through feedback.

Reinforced Learning: The Model

In practice, the reinforcement learning model looks like this:

reinforcement learning

You, as the agent, are in a particular situation (state S), you adopt a specific action (A) to achieve your goal, and you receive your feedback in the shape of punishment or reward (R). 

Reinforcement learning, in other words, is a system of trial and error that comes through interaction with your environment.

Data scientists use these same reinforcement learning principles for programming algorithms to perform tasks.

How Machine Reinforcement Learning Works

Translated to the machine learning world, what you have is a system of trial and error, where the algorithm, or agent, learns from missteps in its simulated environment and gets rewarded after each small success.

The algorithm is often led through various probabilistic models until programmers find the one that is the most effective. This is the model where the algorithm makes the fewest mistakes and gets the greatest number of rewards in the shortest period of time.

Industries That Use Machine Reinforcement Learning

Some of the industries that commonly utilize machine reinforcement learning include the following:

Internet advertising 

Online companies like Facebook use machine learning and data science process to analyze your preferences, background, and online behavior patterns so they can direct relevant ads your way. Since your habits and preferences tend to change, researchers use an algorithm called the deep Q-learning method (more on that later) to constantly update those ads.


Reinforcement learning is the principle behind gaming. Let’s take Pac-Man for instance. In the classic video game, our friendly yellow hero has to gobble all the ghosts in the grid without being touched by them before the player can advance to the next stage. A perfect Pac-Man game is where the player scores maximum points in achieving this feat in the shortest amount of time. 

All games are premised on similar reinforcement learning principles.


Reinforcement learning is used in the finance industry in various ways. One example is trading, where algorithms are trained to forecast market behavior. IBM, for instance, built a financial trading system on its Data Science Experience platform (now called Watson Studio) that uses reinforcement learning to develop algorithms for calculating the profits and losses of industries.


Programmers use reinforcement learning to train robots. Sophisticated algorithms that program robot behavior are developed in controlled environments and led through sequential actions to complete a particular task. Values are accorded for each success, and algorithms are rated successful based on their maximum cumulative rewards, or values. Such deep reinforcement learning methods teach four-legged robots (for instance) how to recover when they fall.

Vehicular navigation

Reinforcement learning is used for training driverless vehicles. U.K.-based Wayve, for example, taught its autonomous vehicles to drive independently within 15-20 minutes. A human driver was placed in the car to intervene when necessary. The underlying algorithms used different trial and error situations for finding the best model that would help the vehicle complete its drive without accidents or intervention. 

Other industries that use reinforcement learning include:

  • Medicine
  • Manufacturing
  • Computer networking
  • Industrial logistics

Get To Know Other Data Science Students

Joy Opsvig

Joy Opsvig

Data Science Apprentice Engineer at LinkedIn

Read Story

Jonas Cuadrado

Jonas Cuadrado

Senior Data Scientist at Feedzai

Read Story

Meghan Thomason

Meghan Thomason

Data Scientist at Spin

Read Story

Basic Reinforcement Learning Techniques

Some of the basic reinforcement learning methods that scientists use for programming machines to achieve their goals include the following:

Markov decision process (MDP) 

The agent is fed several optional paths and its success along each is calculated through probabilistic algorithms. The shortest, most effective path would be the one that helps the agent reach its goal with the fewest hurdles. This is also known as the shortest path problem.

Markov decision process

Dynamic programming (DP)

This is where you solve complex problems by breaking the environment down into subproblems and using the principles of reinforcement learning in each. For instance, a robot has to learn various things: how to move its legs, hands, etc. You break each of these problems into different reinforcement learning environments to simplify your task.

Reward maximization 

This algorithm totals each of the values, or rewards, that the robot gathers on its way (k=0 refers to cumulative expected rewards). 

Reward maximization

This tool is also called the Epsilon-Greedy algorithm, wherein the best solution is decided based on the maximum reward. 

Policy gradient 

This is where you train an algorithm to act based on probabilistic observations. In reinforcement learning, those are called policy observations. That’s the premise behind IBM’s stochastic trading algorithm, for example.


This is a commonly used model-free approach, where you update certain values (called Q values) as your agent stumbles through its trial and error routine. The algorithm for calculating the total experiment is called the Q-learning algorithm. Deep Q-learning is where you mix deep learning with reinforcement learning methods.

Types of Reinforcement Learning

Model-free vs. model-based 

The model-based method is when you build a simulated environment for training your agent. So, for instance, games are often programmed in a model-based environment. In contrast, model-free is where you let your agent run unfettered in a real-life environment. That’s what occurs, at a certain stage, with driverless cars.

Exploration tasks vs. exploitation tasks 

Programmers may want to gather as much information as possible about an environment. That’s called exploration. Alternatively, they may have a different (or additional) goal, which would be to exploit the environment. In this case, they would seek to make it reward-friendly to help the algorithm succeed.

Continuous vs. episodic reinforcement learning

Continuous types of reinforcement learning tasks continue forever. For instance, an agent that forecasts automated Forex/stock trading. Episodic tasks, on the other hand, end at a certain point. Think gaming, where we shoot our opponents or we get killed by them. Either way, the episode ends.

Value-based reinforcement learning

This is where you focus on the values as your condition of success and choose the probabilistic path that has the highest amount of values. 

Policy or action-based 

In this case, you focus on the most effective situation or action—e.g., a driverless car learns to recognize that when it sees a red light it needs to break.

The Limitations of Reinforcement Learning

Reinforcement learning has three main limitations to keep in mind:

  1. The danger of using the model-free method. For instance, since 2014, there have been 34 reported accidents with self-driving cars on California’s roads alone, according to state incident reports.
  2. The agent acquires (and is rewarded for) new knowledge that often causes it to forget the old.
  3. The agent performs the task, but not in the optimal or required way. For instance, the robot kangaroo hits its goal in record time. The only problem? It trotted its way to the end instead of hopping.

To overcome these limitations, some organizations, like Google, join reinforcement learning to deep learning methods. 

Deep Learning Techniques

Deep learning, put simply, is where AI algorithms learn from a huge amount of data.  Say you want your robot to recognize cats, you feed it lots and lots of images of cats that include differences in shape, color, and even types of fur and whiskers so that eventually the robot can recognize a cat from a dog.

That’s exactly how Google programmed its Deep Face algorithm. The facial recognition system recognizes your face from countless others because it’s been fed infinitesimal data points of the curve of your mouth, the color of your eyes, the spread of your nostrils, and so forth. 

Reinforcement Learning vs. Deep Learning

The major difference between reinforcement learning and deep learning is that with reinforcement learning, algorithms learn from trial and error. By contrast, when it comes to deep learning, algorithms learn from a huge amount of data. In practice, you could combine deep learning with reinforcement learning by cramming your algorithm with libraries of data, followed by a reinforcement learning system. The integration of both is called deep reinforcement learning. 


Reinforcement learning is an endlessly fascinating subject with deep, practical insights. Scientists and programmers who work in this field literally shape the world of the future. 

That person could be you.

Here are some additional resources to learn more:

Since you’re here…Are you interested in this career track? Investigate with our free guide to what a data professional actually does. When you’re ready to build a CV that will make hiring managers melt, join our Data Science Bootcamp which will help you land a job or your tuition back!

About Leah Zitter

Leah Zitter is a fintech writer and researcher with more than 10 years of experience writing for digital and print media, B2B and B2C organizations, small- and mid-sized businesses, marketing/advertising/PR agencies, and governments, among others. She has a Ph.D. in psychology/scientific research.