In real life, all of our routine learning, predicting, and decision-making runs on reinforcement learning. It makes sense, then, that scientists build machines using this principle too.
Reinforcement Learning Defined
Here is a simple definition: Think of reinforcement learning as any type of learning that comes about through, and is reinforced by, either positive or negative stimuli. These stimuli either cause you to adopt, retain, or stop a certain habit.
For example, when you mastered the alphabet, you were likely rewarded with hearty compliments from your teacher. You felt good: “Hey, I did it!” Your teacher incrementally rewarded you each small step along the way until you learned how to read.
That’s reinforced learning—where you learned something in life, reinforced through feedback.
Reinforced Learning: The Model
In practice, the reinforcement learning model looks like this:
You, as the agent, are in a particular situation (state S), you adopt a specific action (A) to achieve your goal, and you receive your feedback in the shape of punishment or reward (R).
Reinforcement learning, in other words, is a system of trial and error that comes through interaction with your environment.
Data scientists use these same reinforcement learning principles for programming algorithms to perform tasks.
How Machine Reinforcement Learning Works
Translated to the machine learning world, what you have is a system of trial and error, where the algorithm, or agent, learns from missteps in its simulated environment and gets rewarded after each small success.
The algorithm is often led through various probabilistic models until programmers find the one that is the most effective. This is the model where the algorithm makes the fewest mistakes and gets the greatest number of rewards in the shortest period of time.
Industries That Use Machine Reinforcement Learning
Some of the industries that commonly utilize machine reinforcement learning include the following:
Online companies like Facebook use machine learning to analyze your preferences, background, and online behavior patterns so they can direct relevant ads your way. Since your habits and preferences tend to change, researchers use an algorithm called the deep Q-learning method (more on that later) to constantly update those ads.
Reinforcement learning is the principle behind gaming. Let’s take Pac-Man for instance. In the classic video game, our friendly yellow hero has to gobble all the ghosts in the grid without being touched by them before the player can advance to the next stage. A perfect Pac-Man game is where the player scores maximum points in achieving this feat in the shortest amount of time.
All games are premised on similar reinforcement learning principles.
Reinforcement learning is used in the finance industry in various ways. One example is trading, where algorithms are trained to forecast market behavior. IBM, for instance, built a financial trading system on its Data Science Experience platform (now called Watson Studio) that uses reinforcement learning to develop algorithms for calculating profits and losses of industries.
Programmers use reinforcement learning to train robots. Sophisticated algorithms that program robot behavior are developed in controlled environments and led through sequential actions to complete a particular task. Values are accorded for each success, and algorithms are rated successful based on their maximum cumulative rewards, or values. Such deep reinforcement learning methods teach four-legged robots (for instance) how to recover when they fall.
Reinforcement learning is used for training driverless vehicles. U.K.-based Wayve, for example, taught its autonomous vehicles to drive independently within 15-20 minutes. A human driver was placed in the car to intervene when necessary. The underlying algorithms used different trial and error situations for finding the best model that would help the vehicle complete its drive without accidents or intervention.
Other industries that use reinforcement learning include:
- Computer networking
- Industrial logistics
Basic Reinforcement Learning Techniques
Some of the basic reinforcement learning methods that scientists use for programming machines to achieve their goals include the following:
Markov decision process (MDP)
The agent is fed several optional paths and its success along each is calculated through probabilistic algorithms. The shortest, most effective path would be the one that helps the agent reach its goal with the fewest hurdles. This is also known as the shortest path problem.
Dynamic programming (DP)
This is where you solve complex problems by breaking the environment down into subproblems and using the principles of reinforcement learning in each. For instance, a robot has to learn various things: how to move its legs, hands, etc. You break each of these problems into different reinforcement learning environments to simplify your task.
This algorithm totals each of the values, or rewards, that the robot gathers on its way (k=0 refers to cumulative expected rewards).
This tool is also called the Epsilon-Greedy algorithm, wherein the best solution is decided based on the maximum reward.
This is where you train an algorithm to act based on probabilistic observations. In reinforcement learning, those are called policy observations. That’s the premise behind IBM’s stochastic trading algorithm, for example.
This is a commonly used model-free approach, where you update certain values (called Q values) as your agent stumbles through its trial and error routine. The algorithm for calculating the total experiment is called the Q-learning algorithm. Deep Q-learning is where you mix deep learning with reinforcement learning methods.
Types of Reinforcement Learning
Model-free vs. model-based
The model-based method is when you build a simulated environment for training your agent. So, for instance, games are often programmed in a model-based environment. In contrast, model-free is where you let your agent run unfettered in a real-life environment. That’s what occurs, at a certain stage, with driverless cars.
Exploration tasks vs. exploitation tasks
Programmers may want to gather as much information as possible about an environment. That’s called exploration. Alternatively, they may have a different (or additional) goal, which would be to exploit the environment. In this case, they would seek to make it reward-friendly to help the algorithm succeed.
Continuous vs. episodic reinforcement learning
Continuous types of reinforcement learning tasks continue forever. For instance, an agent that forecasts automated Forex/stock trading. Episodic tasks, on the other hand, end at a certain point. Think gaming, where we shoot our opponents or we get killed by them. Either way, the episode ends.
Value-based reinforcement learning
This is where you focus on the values as your condition of success and choose the probabilistic path that has the highest amount of values.
Policy or action-based
In this case, you focus on the most effective situation or action—e.g., a driverless car learns to recognize that when it sees a red light it needs to break.
The Limitations of Reinforcement Learning
Reinforcement learning has three main limitations to keep in mind:
- The danger of using the model-free method. For instance, since 2014, there have been 34 reported accidents with self-driving cars on California’s roads alone, according to state incident reports.
- The agent acquires (and is rewarded for) new knowledge that often causes it to forget the old.
- The agent performs the task, but not in the optimal or required way. For instance, the robot kangaroo hits its goal in record time. The only problem? It trotted its way to the end instead of hopping.
To overcome these limitations, some organizations, like Google, join reinforcement learning to deep learning methods.
Deep Learning Techniques
Deep learning, put simply, is where AI algorithms learn from a huge amount of data. Say you want your robot to recognize cats, you feed it lots and lots of images of cats that include differences in shape, color, even types of fur and whiskers, so that eventually the robot can recognize a cat from a dog.
That’s exactly how Google programmed its Deep Face algorithm. The facial recognition system recognizes your face from countless others because it’s been fed infinitesimal data points of the curve of your mouth, the color of your eyes, the spread of your nostrils, and so forth.
Reinforcement Learning vs. Deep Learning
The major difference between reinforcement learning and deep learning is that with reinforcement learning, algorithms learn from trial and error. By contrast, when it comes to deep learning, algorithms learn from a huge amount of data. In practice, you could combine deep learning with reinforcement learning by cramming your algorithm with libraries of data, followed by a reinforcement learning system. The integration of both is called deep reinforcement learning.
Reinforcement learning is an endlessly fascinating subject with deep, practical insights. Scientists and programmers who work in this field literally shape the world of the future.
That person could be you.
Here are some additional resources to learn more:
- Reinforcement learning: An introduction, by Richard Sutton, is a classic with a clear and simple account of the key ideas and algorithms of reinforcement learning.
- David Silver’s Reinforcement Learning classes on YouTube
- Reinforcement Learning, a free course offered by Georgia Tech.
- Springboard’s Machine Learning Engineering Career Track, a flexible, mentor-led online program with a job guarantee.