Decoding ‘Game of Thrones’: Data Science in Westeros
In case you haven’t noticed, “Game of Thrones” is a pretty big deal. In fact, it’s 152 times more in demand than the average TV show, according to Parrot Analytics.
The HBO series, an adaptation of George R. R. Martin’s bestselling books, has had an undeniable cultural impact. It has produced memes, merch, and (thousands of) articles. Its theme song alone has inspired hundreds of YouTube covers. And it’s spawned many data science projects.
Spoilers ahead, obviously.
If you’re among the very few who are not familiar, “Game of Thrones” is about a scramble for power in the fantasy world of Westeros (and Essos). At the center of this struggle is the Iron Throne and, with it, the right to rule the seven kingdoms.
The show features a famously long list of colorful characters, and it has a reputation for killing them off when you become attached and least expect it. That’s why many of these data science projects attempt to predict when our favorite characters will bite the dust.
Others focus on the language used in the books and the show. Some attempt to predict plot developments and twists. Still others, for fear of the show’s imminent end, aim to keep it alive with AI-created scripts.
What all of these projects have in common: they rely on data.
Let’s check them out.
Projects That Predict Character Deaths
Most “Game of Thrones” data science projects aim to figure out who’s going to die. As Cersei puts it in the show’s seventh episode, “When you play the Game of Thrones, you win or you die. There is no middle ground.”
A Song of Ice and Data (2016-2019), Technical University of Munich
This project from the Technical University of Munich (TUM) is far more popular than others on this list. I couldn’t Google anything related to the show and data science without coming across this project. In fact, I had to use special search operators to exclude it (-TUM -Munich) in order to find all the other projects on this list; that’s just how prevalent it is.
“A Song of Ice and Data” began as a student project. Since then, it’s gained a lot of media attention, likely because it’s had some success predicting deaths. It scrapes data from the Game of Thrones Wiki and the Wiki of Ice and Fire. Then it uses statistical analysis and machine learning to find features that are common to characters who have already died and calculate the likelihood that other characters might die. These features include house, gender, and whether relatives have died. They call the resulting number the percentage likelihood of death (PLOD).
In 2016, it did a pretty stellar job. For example, it correctly predicted a high PLOD for Tommen Baratheon (97%), Stannis Baratheon (96%), and Petyr Baelish (91%).
However, it was not entirely accurate. It also predicted that Daenerys Targaryen had a 95% PLOD. Obviously, that was wrong.
It likewise miscalculated Davos Seaworth’s longevity, reporting that he had a 91% chance of dying.
It somewhat correctly predicted that Jon Snow wouldn’t die. As we know, he did, and then he didn’t.
One of TUM’s data scientists attributed errors, in part, to the fact that George R.R. Martin is no longer involved in the show’s writing process. The series has moved beyond the events of the five published books (two more are planned).
This wouldn’t be machine learning, however, if it didn’t learn from its mistakes. This year, as the new season unfolds, the model has been tweaked and TUM has some new predictions. The people most likely to die in this final season, according to the data, are:
- Bronn (94%)
- Gregor Clegane, aka the Mountain (80%)
- Sansa Stark (73.3%)
- Bran Stark (57.8%)
- Sandor Clegane, aka the Hound (47.5%)
One of the highly-likely-to-die candidates has already met his grisly demise: Ned Umber (77%). Frankly, I think the model this year isn’t quite as bold. The characters most likely to seize the throne—Jon Snow, Daenerys Targaryen, and Cersei Lannister (who tenuously sits on the throne now)—have very low PLODs. I’d bet at least one of them dies (Cersei), but then again, that’s not the data. That’s my gut. We’ll have to wait and see.
This project from May of 2018 looks a lot like the one from TUM. It also predicts the deaths of characters, only it uses data exclusively from the books to predict the events of future books, not the show.
It similarly scrapes data from wikis to analyze approximately 2,000 characters, but it uses a tree-based approach called LightGBM. Because it relies on data from the books, it predicts that Daenerys will die (84%), Jamie will die (73%), Tyrion will die (71%), Bran will die (66%), Cersei will die (60%), Jon will die (59%)—you know what, pretty much everyone dies. According to its predictions, Gendry is the only one who is relatively safe (39%) when the next book is released. It calls to mind this meme.
Network Science Predicts Who Dies Next in ‘Game of Thrones’ (2017), Milan Janosov
Aiming to predict deaths in season 7 of the show, a Ph.D. candidate at the Central European University created an aggregated network of the realm’s social system. His data set came from the show’s dialogue and subtitles, as recorded at genius.com.
We have the benefit of hindsight when looking at Janosov’s predictions, as season 7 has played out now. So what did Janosov predict? A blood bath.
Many of the predictions missed the mark, but some did come to pass. Robin Arryn, Olenna Tyrell, and Petyr Baelish (who was only half likely to die), for example, died. Daenerys, Grey Worm, Jamie, Sansa, Brienne, Varys, Tyrion, Melisandre, Sandor, and most others did not.
Other ‘Game of Thrones’ Data Science Projects
Not all data science projects about “Game of Thrones” predict death. Some attempt to glean other insights. These projects range from comparing the books to Shakespeare’s plays to generating reports on who’s the most popular character. I’ve tried to select some of the most unique, interesting, and well-rounded of them. Let’s take a look.
Comparing George R.R. Martin to William Shakespeare and J.R. Tolkien: Decoding ‘Game of Thrones’ by way of data science (2019), Peter Vesterberg
In another project that relies on data inputs from the books, Peter Vesterberg uses concepts in natural language processing and machine learning in a numerical exploration that measures lexical diversity, discovers character footprints and word frequency analysis of the plot, and calculates the significance of characters using network theory. It even compares these findings to analyses of Shakespeare.
The project ran the texts through Python with NLTK, and it used Seaborn for visualization and Networkx for network metrics and graphs.
Get To Know Other Data Science Students
Findings for Lexical Diversity
The project discovers that George R.R. Martin has written A LOT. The book series has a total of nearly 1.8 million words. By comparison, the complete works of Shakespeare have about 1 million words. The “Lord of the Rings” (“LOTR”) has about 500,000 words. I guess we can’t make fun of Martin as much for his sluggishness in releasing books anymore.
From there, Vesterberg determines the word variability, or the count of unique words that are not repeated. Martin is at an unfair disadvantage, having far more words in comparison to the others, so he manages a variability of only 1%. Meanwhile, Shakespeare has a variability of 3% for the complete works (but gets 13% for “Hamlet” alone). Tolkien’s “LOTR” has a comparable percentage to Shakespeare.
Finally, he determines density, a measure of how meaningful text is. Here again, Shakespeare wins, but that win is more impressive because speech tends to be less dense than normal prose. Whereas Martin has a density of 58%, Shakespeare has a density of 61%.
Findings for Character Footprints and Word Frequency Analysis
Here it finds that Jon has one of the heaviest footprints, measured by the occurrence of his name. Tyrion, across the five books, has a comparable footprint to Jon. Dany isn’t quite as prevalent until the fifth book.
Words that appear often throughout the series: blood, death, love, father, king, queen, and might. Not too surprising.
Findings for the Most Important Characters Based on Network Theory
Finally, Vesterberg uses network theory to calculate which characters are most important. Unsurprisingly, based on the occurrences of his name, Jon is probably the most important character. He wins in four different categories for how “central” he is to the story.
- He is directly connected to the most characters.
- He is the least distant from all characters.
- He is the best at connecting pairs of characters through the shortest route.
- He is best connected to the most characters.
Building a Game of Throne Chatbot for Slack: Part 1 Understanding Language (2019), Isaac Godfried
As a means of testing his natural language processing skills, Isaac Godfried built a “Game of Thrones” chatbot for Slack. The goal of the chatbot is simple: to get the bot to provide information about the show or answer questions. As it turns out, the approach is a little more complex.
Godfried lists a number of challenges: gathering deep learning and labeled training data, determining context, and answering questions. In the end, what Godfried came up with is a formulaic chatbot that needs some improvement. Eventually, he wants to implement a more unsupervised approach.
How I Used Python to Analyze ‘Game of Thrones’ (2019), Rocky Kev
Rocky Kev used “Game of Thrones” as an excuse to learn Python. Armed with tools such as Selenium and BeautifulSoup, Kev wrote code that automatically logged into ‘GoT’ fansites such as A Forum of Ice and Fire, scrapped the web pages of data, and wrote code that could be used to answer questions and generate CSV reports about the Westerosi universe, such as: who is the most popular “Game of Thrones” character? The answer was Jon, by the way.
Not a bad way to learn Python.
Data Science in Westeros (2018), Yi Shuen Lim
Yi Shuen Lim “nerds out” over all the data science possibilities with the many data sets available for “Game of Thrones.” After looking at these data sets, she decided to put together some visualizations of trends.
One tries to determine if there’s a correlation between screen time for specific characters and whether that earned the show more views. The results were surprising. It appears that Grey Worm, Tormund, and Euron are great for views.
It also appears that, if a character walked a lot in the show (i.e., traveled the farthest distance), he or she didn’t really get any more screen time for it. Theon, for example, traveled about 20,400 miles, even more than Daenerys, who traveled about 19,700 miles. Nonetheless, Jon, who traveled only 11,600 miles, got the most screen time.
Survive the ‘GoT’ Winter With ML and Pachyderm (2019), Pachyderm
As the end of the series approaches, one group of fans that develops AI solutions attempted to prolong the show a bit. They had this thought: “What if we could… build a machine learning pipeline in Pachyderm to generate scripts for us?”
And so it was. The example project is housed on GitHub. While these episodes won’t win any awards, the authors admit, they might distract you from the end of a great show.
‘Game of Thrones’ Fans Predict Who Will Sit on the Iron Throne Using Swarm Platform 2019, Unanimous AI
One project uses a culmination of fans’ gut feelings as data to generate a number of forecasts. Using their “Swarm” platform, they had fans convene and used real-time inputs to produce numerous predictions about season 8.
According to the fans, Jon Snow is the most likely to survive, edging out Dany and Tyrion.
It also predicts the likelihood that certain fan theories will play out. The most likely of these is that Jamie will ultimately kill the woman he so fiercely (and incestuously) loved. Among the other, wilder theories represented were whether Sam would end up the narrator for the series, whether Bran is actually the Night King, whether Arya is going to become the new Littlefinger, whether Bran built the wall, and whether Tyrion is also a Targaryen. They were less likely.
Finally, it predicts that Dany will end up pregnant, that another of her dragons will die, and that Sandor Clegane is going to kill his brother, Gregor.
One prediction has already come true: Tormund and Beric survived the attack on the wall.
Time will tell with the rest.
The End Is Nigh
As these projects demonstrate, data science isn’t limited to generating insights to help businesses increase their bottom line. Sometimes, it’s just wicked fun. If you are a data enthusiast or an aspiring data scientist, you can try such entertaining projects for fun and skill sharpening. And in this case, it’s certainly added a level of enjoyment and diversion to one of the biggest fandoms of our time.
I, for one, can’t wait to see how some of the predictions for this season turn out. Meanwhile, it will be exciting to see what else this final season of “Game of Thrones” produces.
Since you’re here…
Thinking about a career in data science? Enroll in our Data Science Bootcamp, and we’ll get you hired in 6 months. If you’re just getting started, take a peek at our foundational Data Science Course, and don’t forget to peep our student reviews. The data’s on our side.