{"id":8416,"date":"2019-08-09T12:48:43","date_gmt":"2019-08-09T19:48:43","guid":{"rendered":"https:\/\/www.springboard.com\/?p=8416"},"modified":"2023-09-28T00:49:22","modified_gmt":"2023-09-28T07:49:22","slug":"reinforcement-learning","status":"publish","type":"post","link":"https:\/\/www.springboard.com\/blog\/data-science\/reinforcement-learning\/","title":{"rendered":"What Is Reinforcement Learning?"},"content":{"rendered":"\n<p><span style=\"font-weight: 400;\">In real life, all of our routine learning, predicting, and decision-making runs on reinforcement learning. It makes sense, then, that scientists build machines using this principle too.<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">Reinforcement Learning Defined<\/span><\/h2>\n\n\n\n<p><span style=\"font-weight: 400;\">Here is a simple definition: Think of reinforcement learning as any type of learning that comes about through, and is reinforced by, either positive or negative stimuli. These stimuli either cause you to adopt, retain, or stop a certain habit.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">For example, when you mastered the alphabet, you were likely rewarded with hearty compliments from your teacher. You felt good: &#8220;Hey, I did it!&#8221; Your teacher incrementally rewarded you each small step along the way until you learned how to read.&nbsp;<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">That&#8217;s reinforced learning<\/span><span style=\"font-weight: 400;\">\u2014<\/span><span style=\"font-weight: 400;\">where you learned something in life, reinforced through feedback.<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">Reinforced Learning: The Model<\/span><\/h2>\n\n\n\n<p><span style=\"font-weight: 400;\">In practice, the reinforcement learning model looks like this:<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1056\" height=\"428\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/image1-2.png\" alt=\"reinforcement learning\" class=\"wp-image-8417\" srcset=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/image1-2.png 1056w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/image1-2-400x162.png 400w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/image1-2-768x311.png 768w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/image1-2-380x154.png 380w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/image1-2-700x284.png 700w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/image1-2-380x154.png 420w\" sizes=\"(max-width: 1056px) 100vw, 1056px\" \/><\/figure>\n\n\n\n<p><span style=\"font-weight: 400;\">You, as the agent, are in a particular <\/span><i><span style=\"font-weight: 400;\">situation<\/span><\/i><span style=\"font-weight: 400;\"> (state S), you adopt a specific <\/span><i><span style=\"font-weight: 400;\">action<\/span><\/i><span style=\"font-weight: 400;\"> (A) to achieve your goal, and you receive your feedback in the shape of <\/span><i><span style=\"font-weight: 400;\">punishment<\/span><\/i><span style=\"font-weight: 400;\"> or <\/span><i><span style=\"font-weight: 400;\">reward<\/span><\/i><span style=\"font-weight: 400;\"> (R).&nbsp;<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Reinforcement learning, in other words, is a system of trial and error that comes through interaction with your environment.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\"><a href=\"https:\/\/www.springboard.com\/blog\/data-science\/what-does-a-data-scientist-do\/\" data-type=\"post\" data-id=\"24427\">Data scientists<\/a> use these same reinforcement learning principles for programming algorithms to perform tasks.<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">How Machine Reinforcement Learning Works<\/span><\/h2>\n\n\n\n<p><span style=\"font-weight: 400;\">Translated to the machine learning world, what you have is a system of trial and error, where the algorithm, or agent, learns from missteps in its simulated environment and gets rewarded after each small success.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">The algorithm is often led through various probabilistic models until programmers find the one that is the most effective. This is the model where the algorithm makes the fewest mistakes and gets the greatest number of rewards in the shortest period of time.<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">Industries That Use Machine Reinforcement Learning<\/span><\/h2>\n\n\n\n<p><span style=\"font-weight: 400;\">Some of the industries that commonly utilize machine reinforcement learning include the following:<\/span><\/p>\n\n\n\n<p><b>Internet advertising&nbsp;<\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Online companies like Facebook use machine learning and <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/data-science-definition\/\">data science<\/a> process to analyze your preferences, background, and online behavior patterns so they can direct relevant ads your way. Since your habits and preferences tend to change, researchers use an algorithm called the <\/span><b>deep Q-learning method<\/b><span style=\"font-weight: 400;\"> (more on that later) to constantly update those ads.<\/span><\/p>\n\n\n\n<p><b>Gaming<\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Reinforcement learning is the principle behind gaming. Let&#8217;s take <\/span><a href=\"http:\/\/www.freepacman.org\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Pac-Man<\/span><\/a><span style=\"font-weight: 400;\"> for instance. In the classic video game, our friendly yellow hero has to gobble all the ghosts in the grid without being touched by them before the player can advance to the next stage. A perfect Pac-Man game is where the player scores maximum points in achieving this feat in the shortest amount of time.&nbsp;<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">All games are premised on similar reinforcement learning principles.<\/span><\/p>\n\n\n\n<p><b>Finance<\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Reinforcement learning is used in the finance industry in various ways. One example is trading, where algorithms are trained to forecast market behavior. IBM, for instance, built a financial trading system on its <\/span><a href=\"https:\/\/developer.ibm.com\/recipes\/tutorials\/ibm-data-science-experience-dsx-platform\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Data Science Experience<\/span><\/a><span style=\"font-weight: 400;\"> platform (now called <\/span><a href=\"https:\/\/cloud.ibm.com\/catalog\/services\/watson-studio\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Watson Studio<\/span><\/a><span style=\"font-weight: 400;\">) that uses reinforcement learning to develop algorithms for calculating the profits and losses of industries.<\/span><\/p>\n\n\n\n<p><b>Robotics<\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Programmers use reinforcement learning to train robots. Sophisticated algorithms that program robot behavior are developed in controlled environments and led through sequential actions to complete a particular task. Values are accorded for each success, and algorithms are rated successful based on their maximum cumulative rewards, or values. Such deep reinforcement learning methods teach four-legged robots (for instance) how to recover when they fall.<\/span><\/p>\n\n\n\n<p><b>Vehicular navigation<\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Reinforcement learning is used for training driverless vehicles. U.K.-based <\/span><a href=\"https:\/\/wayve.ai\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Wayve<\/span><\/a><span style=\"font-weight: 400;\">, for example, taught its autonomous vehicles to drive independently within 15-20 minutes. A human driver was placed in the car to intervene when necessary. The underlying algorithms used different trial and error situations for finding the best model that would help the vehicle complete its drive without accidents or intervention.&nbsp;<\/span><\/p>\n\n\n\n<p><b>Other industries that use reinforcement learning include:<\/b><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span style=\"font-weight: 400;\">Medicine<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">Manufacturing<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">Computer networking<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">Industrial logistics<\/span><\/li>\n<\/ul>\n\n\n<div class=\"bg-leaf-50 p-4 my-3\"><h4 class=\"fw-bold text-center\">Get To Know Other\tData Science Students<\/h4><div class=\"row row-cols-1 row-cols-lg-3\"><div class=\"col\"><div class=\"card success-story-card h-100 d-flex justify-content-between mb-0\"><div class=\"flex-grow-1 text-center\"><a class=\"d-inline-block rounded-circle\" href=\"\/success\/abby-morgan\" style=\"width:125px;height:125px;overflow:hidden\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/res.cloudinary.com\/springboard-images\/image\/upload\/v1654205000\/Student%20Success\/Abby_Morgan.jpg\" alt=\"Abby Morgan\" style=\"object-fit:contain;max-width:170px;height:125px\" \/><\/a><p class=\"fw-bold mb-0\">Abby Morgan<\/p><p class=\"text-muted lh-1\">Data Scientist at NPD Group<\/p><\/div><div class=\"w-100 d-block d-md-none mt-3\"><\/div><p class=\"mb-0 mx-auto text-center\"><a class=\"btn btn-primary mx-auto\" href=\"\/success\/abby-morgan\">Read Story<\/a><\/p><\/div><\/div><div class=\"col d-none d-md-block\"><div class=\"card success-story-card h-100 d-flex justify-content-between mb-0\"><div class=\"flex-grow-1 text-center\"><a class=\"d-inline-block rounded-circle\" href=\"\/success\/bryan-dickinson\" style=\"width:125px;height:125px;overflow:hidden\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/res.cloudinary.com\/springboard-images\/image\/upload\/v1638213300\/Student%20Success\/Bryan_Dickinson_125x125.png\" alt=\"Bryan Dickinson\" style=\"object-fit:contain;max-width:170px;height:125px\" \/><\/a><p class=\"fw-bold mb-0\">Bryan Dickinson<\/p><p class=\"text-muted lh-1\">Senior Marketing Analyst at REI<\/p><\/div><p class=\"mb-0 mx-auto text-center\"><a class=\"btn btn-primary mx-auto\" href=\"\/success\/bryan-dickinson\">Read Story<\/a><\/p><\/div><\/div><div class=\"col d-none d-md-block\"><div class=\"card success-story-card h-100 d-flex justify-content-between mb-0\"><div class=\"flex-grow-1 text-center\"><a class=\"d-inline-block rounded-circle\" href=\"\/success\/esme-gaisford\" style=\"width:125px;height:125px;overflow:hidden\"><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/res.cloudinary.com\/springboard-images\/image\/upload\/v1629203193\/Student%20Success\/Esme_Gaisford_125x125.png\" alt=\"Esme Gaisford\" style=\"object-fit:contain;max-width:170px;height:125px\" \/><\/a><p class=\"fw-bold mb-0\">Esme Gaisford<\/p><p class=\"text-muted lh-1\">Senior Quantitative Data Analyst at Pandora<\/p><\/div><p class=\"mb-0 mx-auto text-center\"><a class=\"btn btn-primary mx-auto\" href=\"\/success\/esme-gaisford\">Read Story<\/a><\/p><\/div><\/div><\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">Basic Reinforcement Learning Techniques<\/span><\/h2>\n\n\n\n<p><span style=\"font-weight: 400;\">Some of the basic reinforcement learning methods that scientists use for programming machines to achieve their goals include the following:<\/span><\/p>\n\n\n\n<p><b>Markov decision process (MDP)&nbsp;<\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">The agent is fed several optional paths and its success along each is calculated through probabilistic algorithms. The shortest, most effective path would be the one that helps the agent reach its goal with the fewest hurdles. This is also known as the <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Shortest_path_problem\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">shortest path problem<\/span><\/a><b>.<\/b><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"805\" height=\"422\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/image3-1.png\" alt=\"Markov decision process\" class=\"wp-image-8418\" srcset=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/image3-1.png 805w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/image3-1-400x210.png 400w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/image3-1-768x403.png 768w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/image3-1-380x199.png 380w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/image3-1-700x367.png 700w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/image3-1-380x199.png 420w\" sizes=\"(max-width: 805px) 100vw, 805px\" \/><\/figure>\n\n\n\n<p><b>Dynamic programming (DP)<\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">This is where you solve complex problems by breaking the environment down into subproblems and using the principles of reinforcement learning in each. For instance, a robot has to learn various things: how to move its legs, hands, etc. You break each of these problems into different reinforcement learning environments to simplify your task.<\/span><\/p>\n\n\n\n<p><b>Reward maximization&nbsp;<\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">This algorithm totals each of the values, or rewards, that the robot gathers on its way (k=0 refers to cumulative expected rewards).&nbsp;<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"525\" height=\"224\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/image2-2.png\" alt=\"Reward maximization\" class=\"wp-image-8419\" srcset=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/image2-2.png 525w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/image2-2-400x171.png 400w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/image2-2-380x162.png 380w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/image2-2-380x162.png 420w\" sizes=\"(max-width: 525px) 100vw, 525px\" \/><\/figure>\n\n\n\n<p><span style=\"font-weight: 400;\">This tool is also called the <\/span><a href=\"https:\/\/jamesmccaffrey.wordpress.com\/2017\/11\/30\/the-epsilon-greedy-algorithm\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Epsilon-Greedy algorithm<\/span><\/a><b>, <\/b><span style=\"font-weight: 400;\">wherein the best solution is decided based on the maximum reward.&nbsp;<\/span><\/p>\n\n\n\n<p><b>Policy gradient&nbsp;<\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">This is where you train an algorithm to act based on probabilistic observations. In reinforcement learning, those are called <\/span><b>policy observations<\/b><span style=\"font-weight: 400;\">. That\u2019s the premise behind IBM\u2019s stochastic trading algorithm, for example.<\/span><\/p>\n\n\n\n<p><b>Q-learning<\/b><span style=\"font-weight: 400;\">&nbsp;<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">This is a commonly used model-free approach, where you update certain values (called <\/span><b>Q values<\/b><span style=\"font-weight: 400;\">) as your agent stumbles through its trial and error routine. The algorithm for calculating the total experiment is called the <\/span><b>Q-learning algorithm<\/b><span style=\"font-weight: 400;\">.<\/span><b> Deep Q-learning <\/b><span style=\"font-weight: 400;\">is where you mix deep learning with reinforcement learning methods.<\/span><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1244\" height=\"192\" src=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/Screen-Shot-2019-08-14-at-12.28.46-PM.png\" alt=\"\" class=\"wp-image-8499\" srcset=\"https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/Screen-Shot-2019-08-14-at-12.28.46-PM.png 1244w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/Screen-Shot-2019-08-14-at-12.28.46-PM-400x62.png 400w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/Screen-Shot-2019-08-14-at-12.28.46-PM-1200x185.png 1200w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/Screen-Shot-2019-08-14-at-12.28.46-PM-768x119.png 768w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/Screen-Shot-2019-08-14-at-12.28.46-PM-380x59.png 380w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/Screen-Shot-2019-08-14-at-12.28.46-PM-700x108.png 700w, https:\/\/www.springboard.com\/blog\/wp-content\/uploads\/2019\/08\/Screen-Shot-2019-08-14-at-12.28.46-PM-380x59.png 420w\" sizes=\"(max-width: 1244px) 100vw, 1244px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">Types of Reinforcement Learning<\/span><\/h2>\n\n\n\n<p><b>Model-free vs. model-based&nbsp;<\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">The model-based method is when you build a simulated environment for training your agent. So, for instance, games are often programmed in a model-based environment. In contrast, model-free is where you let your agent run unfettered in a real-life environment. That&#8217;s what occurs, at a certain stage, with driverless cars.<\/span><\/p>\n\n\n\n<p><b>Exploration tasks vs. exploitation tasks&nbsp;<\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Programmers may want to gather as much information as possible about an environment. That\u2019s called <\/span><b>exploration<\/b><span style=\"font-weight: 400;\">. Alternatively, they may have a different (or additional) goal, which would be to exploit the environment. In this case, they would seek to make it reward-friendly to help the algorithm succeed.<\/span><\/p>\n\n\n\n<p><b>Continuous vs. episodic reinforcement learning<\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Continuous types of reinforcement learning tasks continue forever. For instance, an agent that forecasts automated Forex\/stock trading. Episodic tasks, on the other hand, end at a certain point. Think gaming, where we shoot our opponents or we get killed by them. Either way, the episode ends.<\/span><\/p>\n\n\n\n<p><b>Value-based reinforcement learning<\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">This is where you focus on the values as your condition of success and choose the probabilistic path that has the highest amount of values.&nbsp;<\/span><\/p>\n\n\n\n<p><b>Policy or action-based&nbsp;<\/b><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">In this case, you focus on the most effective situation or action\u2014e.g., a driverless car learns to recognize that when it sees a red light it needs to break.<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">The Limitations of Reinforcement Learning<\/span><\/h2>\n\n\n\n<p><b>Reinforcement learning has three main limitations to keep in mind:<\/b><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><span style=\"font-weight: 400;\">The danger of using the model-free method. For instance, since 2014, there have been 34 reported accidents with self-driving cars on California\u2019s roads alone, according to state incident reports.<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">The agent acquires (and is rewarded for) new knowledge that often causes it to forget the old.<\/span><\/li>\n\n\n\n<li><span style=\"font-weight: 400;\">The agent performs the task, but not in the optimal or required way. For instance, the robot kangaroo hits its goal in record time. The only problem? It trotted its way to the end instead of hopping.<\/span><\/li>\n<\/ol>\n\n\n\n<p><span style=\"font-weight: 400;\">To overcome these limitations, some organizations, like Google, join reinforcement learning to deep learning methods.&nbsp;<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">Deep Learning Techniques<\/span><\/h2>\n\n\n\n<p><span style=\"font-weight: 400;\">Deep learning, put simply, is where AI algorithms learn from a huge amount of data.&nbsp; Say you want your robot to recognize cats, you feed it lots and lots of images of cats that include differences in shape, color, and even types of fur and whiskers so that eventually the robot can recognize a cat from a dog.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">That\u2019s exactly how Google programmed its <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/DeepFace\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Deep Face<\/span><\/a><span style=\"font-weight: 400;\"> algorithm. The facial recognition system recognizes your face from countless others because it\u2019s been fed infinitesimal data points of the curve of your mouth, the color of your eyes, the spread of your nostrils, and so forth.&nbsp;<\/span><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">Reinforcement Learning vs. Deep Learning<\/span><\/h2>\n\n\n\n<p><span style=\"font-weight: 400;\">The major difference between reinforcement learning and deep learning is that with reinforcement learning, algorithms learn from trial and error. By contrast, when it comes to deep learning, algorithms learn from a huge amount of data. In practice, you could combine deep learning with reinforcement learning by cramming your algorithm with libraries of data, followed by a reinforcement learning system. The integration of both is called <\/span><b>deep reinforcement learning.&nbsp;<\/b><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span style=\"font-weight: 400;\">Conclusion<\/span><\/h2>\n\n\n\n<p><span style=\"font-weight: 400;\">Reinforcement learning is an endlessly fascinating subject with deep, practical insights. Scientists and programmers who work in this field literally shape the world of the future.&nbsp;<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">That person could be you.<\/span><\/p>\n\n\n\n<p><span style=\"font-weight: 400;\">Here are some additional resources to learn more:<\/span><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><span style=\"font-weight: 400;\"><a href=\"http:\/\/incompleteideas.net\/book\/bookdraft2017nov5.pdf\" target=\"_blank\" rel=\"noopener\">Reinforcement learning: An introduction<\/a>, by Richard Sutton, is a classic with a clear and simple account of the key ideas and algorithms of reinforcement learning. <\/span><\/li>\n\n\n\n<li><a href=\"https:\/\/www.youtube.com\/watch?v=2pWv7GOvuf0&amp;list=PL7-jPKtc4r78-wCZcQn5IqyuWhBZ8fOxT\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">David Silver&#8217;s Reinforcement Learning classes<\/span><\/a><span style=\"font-weight: 400;\"> on YouTube<\/span><\/li>\n\n\n\n<li><a href=\"https:\/\/in.udacity.com\/course\/reinforcement-learning--ud600\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Reinforcement Learning<\/span><\/a><span style=\"font-weight: 400;\">, a free course offered by Georgia Tech.<\/span><\/li>\n<\/ul>\n\n\n\n<p class=\"rm has-background\" style=\"background-color:#efeff6\"><strong>Since you\u2019re here\u2026<\/strong>Are you interested in this career track? Investigate with our free guide to <a href=\"https:\/\/www.springboard.com\/blog\/data-science\/what-does-a-data-scientist-do\/\" data-type=\"post\" data-id=\"24427\">what a data professional <em>actually<\/em> does<\/a>. When you\u2019re ready to build a CV that will make hiring managers melt, join our <a href=\"https:\/\/www.springboard.com\/courses\/data-science-career-track\/\" data-type=\"URL\" data-id=\"https:\/\/www.springboard.com\/courses\/data-science-career-track\/\" target=\"_blank\" rel=\"noreferrer noopener\">Data Science Bootcamp<\/a> which will help you land a job or your tuition back!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In real life, all of our routine learning, predicting, and decision-making runs on reinforcement learning. It makes sense, then, that scientists build machines using this principle too. Reinforcement Learning Defined Here is a simple definition: Think of reinforcement learning as any type of learning that comes about through, and is reinforced by, either positive or [&hellip;]<\/p>\n","protected":false},"author":79,"featured_media":8421,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_eb_attr":"","_eb_data_table":"","footnotes":""},"categories":[67],"tags":[],"marketing_tags":[1476],"class_list":{"0":"post-8416","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-data-science"},"acf":[],"_links":{"self":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts\/8416"}],"collection":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/users\/79"}],"replies":[{"embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/comments?post=8416"}],"version-history":[{"count":4,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts\/8416\/revisions"}],"predecessor-version":[{"id":50125,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/posts\/8416\/revisions\/50125"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/media\/8421"}],"wp:attachment":[{"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/media?parent=8416"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/categories?post=8416"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/tags?post=8416"},{"taxonomy":"marketing_tags","embeddable":true,"href":"https:\/\/www.springboard.com\/blog\/wp-json\/wp\/v2\/marketing_tags?post=8416"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}