Take a look at your Netflix homepage. Scroll down, past the banner-sized trailer playing at the top—it’s probably something right up your alley, but don’t get distracted. Look for the row of thumbnails labeled, “Popular on Netflix.” Coincidentally, your new favorite show appears first in the lineup. You’re surprised that the entire nation is similarly smitten with “Emily in Paris”—but hey, your fellow binge-watchers must have good taste.
Next, ask your neighbor who loves dark, brooding shows to navigate to his Netflix homepage. Coincidentally, his new favorite show, “Ozark,” is first in the “Popular on Netflix” row, while “Emily in Paris” doesn’t rank at all. Is this a glitch in the matrix? Do you and the global “Emily In Paris” fandom exist in an alternate reality?
In a word, no. The “Popular on Netflix” section displays the site’s most-watched titles—among viewers whose habits and preferences resemble yours. You and your “Ozark”-binging neighbor don’t live in opposite dimensions—you just occupy separate taste clusters. Netflix sorts users into taste clusters based on how they interact with content on the site.
Netflix is one of the many consumer-facing platforms harnessing AI-driven recommendation systems to personalize your user experience. Today, machine learning models custom-curate everything from your Instagram feed to your Spotify playlists. Personalization looks to the past to predict the future, inferring the probability that you’ll like something based on your historical patterns of behavior.
But this wasn’t always the case—early recommendation systems were not driven by machine learning algorithms, and the recommendations they generated would likely seem impersonal, irrelevant, or even random to users today.
The Netflix Prize
On Netflix, your taste cluster membership influences everything from the thumbnails that appear in your top-ten galleries to the trailers that play when you hover over said thumbnails. From there, things start to get more personal. If you’ve demonstrated an appreciation for Tina Fey films, for example, but haven’t watched much content featuring Amy Poehler, a thumbnail trailer for their joint comedic romp “Baby Mama” will exclusively feature Tina Fey.
In the 1990s and early 2000s, e-commerce giants like Amazon and eBay were using rudimentary recommendation systems to suggest products to customers. At that time, recommendations were largely generated by rules-based systems and if/then processes of conditional logic. Essentially, sites would suggest popular items to users. If you bought Mariah Carey’s Christmas CD through a large e-commerce platform like Amazon, its recommendation engine would then suggest that you buy a color Game Boy, simply because that was a widely popular purchase at the time. A lack of context hampered the efficacy of popularity-based recommendations. Why would someone want a color Game Boy if their purchase history didn’t indicate an interest in video games or toys?
The limitations of these methods meant that dot-com era recommendation engines weren’t particularly effective at matching users with what they needed when they needed it. “There was really no real-time aspect,” said Bargava Subramanian, a Springboard data science mentor and co-founder of Binaize, a business-facing AI-driven product that analyzes consumer interactions with e-commerce websites to recommend strategies that will increase conversion rates.
Subramanian’s work in data science and AI has spanned 17 years. At the start of his career, recommendation systems existed primarily on e-commerce websites. These dot-com era engines moved slowly—the deep learning algorithms that power real-time personalization were a long way off. “A lot of things were already done on a database,” Subramanian said. “You’d let it be for a week, and then somebody would run the script manually and update it.”
The next evolution of recommendation systems focused on content. In September 1998, Amazon filed a patent for “collaborative recommendations using item-to-item similarity mappings.” Essentially, these recommendation systems analyzed users’ purchases to identify and then recommend items that were similar to one another. Because customers who buy diapers often buy baby formula, a customer buying one item would likely receive a recommendation to buy the other item too.
Item-to-item similarity mapping was powered by metadata—in other words, data about the items themselves. “You have some ratings, you have some information,” explained Subramanian. “If it’s a shelf, it’s made of wood … what kind of wood, what color?” This information was entered into statistical models that then determined the similarity of two items.
Then, in October 2006, came the Netflix Prize: an open competition offering $1 million to any programmer who could increase the accuracy of the company’s recommendation engine by 10%. At the time, Cinematch, the Netflix recommendation engine, employed collaborative filtering algorithms to predict whether a user would like a film based on how much they liked or disliked other films. Those predictions were used to personalize DVD rental and streaming recommendations.
The high-profile challenge made global headlines and drew attention to collaborative filtering models. “This model has existed for ten, fifteen years—but the Netflix competition made it very popular,” Subramanian said. “The rise to fame of data science as a field itself was because of this one-million-dollar competition. At that time a million dollars for a competition—a nerdy competition—was unheard of.”
Collaborative filtering models focus on item-to-item and user-to-user similarity. These algorithms make recommendations based on the behavior of users with similar preferences. If you’re a single millennial woman in New York City who likes nostalgic romantic comedies, a collaborative filtering model might recommend “10 Things I Hate About You” because other single millennial women in New York who also like nostalgic romantic comedies have watched that movie too.
Collaborative filtering is now foundational to modern recommendation engines. “Almost everyone has some variant of it as the best model before they go and start building more complex models,” Subramanian said.
Increasingly sophisticated models exist today because of the large number of algorithms available to train them. These new algorithms hail from the ascendant sphere of deep learning, which uses artificial neural networks to analyze massive volumes of unstructured data. Deep learning models convert images, videos, and long-form text into meaningful data, unlocking reams of information that recommendation systems previously could not access.
The availability of vast amounts of fine-grained information enables sensitive, highly accurate user recommendations. We are now shopping, streaming, and consuming news in an era of hyper-personalization.
The evolution of personalization algorithms
“Personalization is not just about what you like,” explained Misael Manjarres, a Springboard data science mentor and the senior director of data science at Peacock, NBCUniversal’s new streaming service. “It’s about what you like at a given moment.”
Equipped with machine learning algorithms that can identify behavioral patterns, modern recommendation engines strive to deliver the right content or products at the right time. To make appropriate suggestions, recommendation engines need to know what you do, when you do it, how you do it—and why you do it.
Personalization draws on data about users, the items users interact with, and the signals that these interactions generate. Items could be products on Amazon, content on Netflix, or posts on Instagram. Each time a user interacts with an item, the user creates explicit and implicit signals about their preferences.
Instagram personalized ad targeting is a great example. “My wife asked me what I wanted for Christmas and I told her I wanted black high top sneakers, but I hadn’t found ones I loved,” Manjarres said. “The next time I opened Instagram I got an ad for exactly the type of sneaker I wanted. I bought them right away. There are hundreds of brands I would have to search through, but the targeted advertisement narrowed it down to a pair I really liked.”
Deep learning models facilitate this fine-grained item analysis. The goal is to understand what drives users to consume what they consume. Metadata is an easy place to start the analytical slicing and dicing. Is this an action film? Who directed it? When was the film released? Is it rated PG-13 or R? But deep learning models can analyze unstructured data to pinpoint more abstract characteristics.
Natural language processing can analyze the plot and theme of a movie from a free text synopsis of the content, or the entire script itself. These models then generate embeddings: numeric representations of films that capture plot and theme. Similar movies will have similar numeric representations. Image analysis is also an important part of the equation.
“Machine learning algorithms can analyze the video itself and tease out color patterns that tell you about mood,” Manjarres said. Mood analysis can reveal insights into a user’s emotional desires and mental state when they sit down to watch a film or TV show.
This offers context—a factor of major concern in the realm of recommendation systems. Context affects user preferences. You might want to watch “Lord of the Rings” at 10pm on a Saturday night, but that won’t be the case at 8am on Monday morning when you want to tune into the news before work. “You’re not going to be interested in all those things to the same extent every single day, and within every single context,” Manjarres said.
To deduce context, recommendation engines consider what you watched, what time you watched it, what day of the week you watched it, and what type of device you watched it on.
“You’re probably not going to watch a two-hour movie on a mobile device, but you might watch something shorter,” Manjarres said. Thanks to personalization, your mobile viewing recommendations will take the limitations of a smartphone screen into consideration.
According to Subramanian, serendipity is also a key part of this equation. Successful recommendation systems like Spotify’s Discover Weekly playlists work to introduce users to new songs and artists that they might like. Optimizing for serendipity builds excitement and trust in the platform while simultaneously informing the recommendation engine about user preferences.
Serendipity spotlights the value of recommendation systems from a user’s perspective. “There is a vast universe that you will never be able to explore fully because you just don’t have the time,” Manjarres explained. “The job of a recommendation system is to put you in the right galaxy. A really good recommendation system will put you in your own solar system, and then it’s easy—you don’t really have to travel far, clicking around, trying to find things. It puts you where you can start to explore.”
The future of AI-based personalization
Personalization will always exist in the world of streaming, but, as the technology grows and evolves, it will soon begin to play a bigger role outside the world of entertainment. Manjarres sees a big opportunity for personalization in education, for example.
“Not all kids or people learn in the same way, and the types of things that they’re interested in are not always going to be the same,” he said. “Right now, there’s just a kind of broad stroke decision on what every kid needs to learn and how they need to learn it. That may not be the best way of doing things.”
Subramanian also sees personalization expanding beyond commonplace consumer-facing applications like streaming. Currently, recommendation systems are well known in the business-to-consumer (B2C) space, where platforms like Tik Tok, Instagram, and Shopify use personalization to optimize experiences for users. Conversely, he observes, recommendation systems in the business-to-business (B2B) space are relatively uncommon.
Subramanian believes that businesses need personalization too, and stand to benefit immensely from data-driven recommendations about how to maximize conversion rates and optimize their platforms for sales. AI can analyze the effectiveness of digital strategies from design to messaging and offer actionable insights about what companies should do differently.
Nevertheless, AI-driven B2B solutions like Subramanian’s own Binaize, which personalizes conversion-boosting strategies for e-commerce websites, remain few and far between. He sees the B2B area as largely untapped—a new field for recommendation systems. “It hasn’t seen the same kind of evolution as the consumer space,” Subramanian said. “But there’s going to be a huge growth area in the personalization of business offerings.”
Personalization has progressed at a swift clip over the past four years thanks to the dramatic evolution of deep learning models. Natural language processing, which uses neural networks to extract information from long-form text data, has only existed in its present incarnation for two years. This means that the complex image and text-based data used to train current deep learning models was generated in the past few years alone.
If deep learning algorithms have advanced rapidly with only a few years of training data, does this mean that we’ll soon see a totally automated version of personalization in which AI-driven recommendation systems operate without human input?
Not quite, Manjarres said. Machine learning models are built to find patterns across massive data sets, he explains, “but right now, they can’t think for themselves very well in terms of what to go out and look for.”
That’s where humans come in. “Human capacity for intuition is really unparalleled right now, and it’s always going to be a marriage between algorithmically-driven decisions and human intuition,” Manjarres said.
“They’re always going to go hand in hand.”
Is machine learning engineering the right career for you?
Knowing machine learning and deep learning concepts is important—but not enough to get you hired. According to hiring managers, most job seekers lack the engineering skills to perform the job. This is why more than 50% of Springboard’s Machine Learning Career Track curriculum is focused on production engineering skills. In this course, you’ll design a machine learning/deep learning system, build a prototype, and deploy a running application that can be accessed via API or web service. No other bootcamp does this.
Our machine learning training will teach you linear and logistical regression, anomaly detection, cleaning, and transforming data. We’ll also teach you the most in-demand ML models and algorithms you’ll need to know to succeed. For each model, you will learn how it works conceptually first, then the applied mathematics necessary to implement it, and finally learn to test and train them.
Find out if you’re eligible for Springboard’s Machine Learning Career Track.