With all the hype surrounding the show at the moment I thought it the perfect time to investigate how Data Science can be applied to everyone’s favourite TV show.
Over the next few weeks I’m going to be developing some analysis centring around Game of Thrones.
Depending on how well the articles are received and how many interesting facts I can glean from the data I’ve sourced, I may end up with several articles (I might even turn this into a ‘series’ — I’m new to Medium so not fully sure what this entails) or it may only be a few.
The first couple of posts at least will be based around transcripts of the first 7 seasons.
This opening article will – for now – breeze over how I actually got hold of the transcripts (I may go into more detail in future, it wasn’t straightforward!) and will dive straight in to some of my initial discoveries — many of which made me laugh because of how well they represented the characters and storylines from the show.
All the data in this post was scraped from the genius.
com lyrics website, which also (surprisingly) has transcripts to every Game of Thrones episode on it.
I used a couple of different scraping algorithms that I wrote myself in R.
Plenty of data wrangling later, I had a transcript for every episode of Game of Thrones up until the end of season 7.
I can’t speak to the ‘official’ accuracy of these but having read through several of them they seem to be pretty spot on.
If you had to guess, which character from the show is most similiar to Jon Snow?So I now have my dataset of every word spoken by every character on the first seven seasons of the show, from Jon Snow to “Soldier #6".
The first and simplest way to analyse a particular character is to see which word(s) they use most commonly.
After removing all stop-words (‘a’, ‘of’, ‘the’ etc.
), I had a look at what everyones favourite word was…well by ‘everyone’ I mean 16 of the biggest characters in recent seasons.
There were over 400 unique characters so I had to be picky!Below you can see the results and I think this might be my favourite table I’ve ever produced, given that it perfectly describes each character.
Overall most commonly used word for each of the selected characters, as well as relative frequency (what % of all words they speak are this particular word)Some comments:If you guessed Jaime in answer to the question above, the data (here) agrees with you.
I think that speaks to his and Jon’s leadership qualities and selfless nature (or Jaime uses it incestuously and Jon uses it anti-White-Walkerishly).
I think if you had to describe Arya, Euron & The Hound (Sandor Clegane) in one word, you probably couldn’t do much better than that.
Tywin Lannister clearly had a significant impact on the lives of two of his three children it seems…You can tell who has had to learn to be the most courteous/political: Littlefinger (Petyr Baelish), Varys & Sansa.
If you’re wondering why Dany’s top word is ‘take’, think about her life #goals –to ‘take’ back the iron throne.
I imagine it’s a common word among all conquerors.
If you’ve ever wondered what Yara spends all her time worrying about, just know that her second most common word was ‘brother’…I think this goes to show that the language we speak reflects considerably upon us as individuals.
Obviously this is heightened in a TV show where scripts are written specifically to portray a particular character/trait, but I think this still holds true in everyday life.
Who do you think likes the sound of their own voice?Next up we’ll look at quantity over quality.
This gives an indication of both screen time (i.
who is considered a ‘main’ character in the TV show), but also who is the least succinct.
The plot below shows, for the same 16 characters, how many words they spoke across all seasons as well as their average words per episode they appeared in.
The total number of words spoken across all 7 seasons and the average number of words per episode.
Ordered by total number of words spoken.
It seems the Lannisters live by their House words (“Hear me roar”) as all three of those that are still alive top the charts for pure volume of words spoken.
After this the rest of the list is a bit unsurprising: Jon & Dany up there due to their status as main characters and Varys & Littlefinger due to their eloquent nature.
Lastly —mapping a journey with wordsThis final section will look at how the characters’ speech has varied across seasons 1–7.
Will the data show patterns and storylines that we’ve observed in the show?.The answer appears to be yes, it often does.
Below is a series of plots mapping how talkative the characters are in each season.
The lines show the average number of words per episode, for each season.
As you can see I have added some annotations of interesting points where a character’s verbosity matches their storyline (note: it may be hard to see the details on the plot above, the y-axis is ‘Average Number of words per episode’ and the x-axis is Season.
The captions are discussed below.
Cersei appears to have lost her confidence after her naked walk of shame & Myrcella‘s death at the end of season 5.
Although it was evidently just a rehabilitation period for her as she was back with a vengeance in season 7.
Sansa appears to have steadily grown in confidence, barring season 5, the duration of which was spent as Ramsay’s wife (not a pleasant fate).
Theon was riding high in season 2 where he featured prominently as Robb’s right-hand-man-turned-traitor and his subsequent taking Winterfell.
Unfortunately it was downhill from there for poor Theon…Below we see how a character’s vocabulary can also reflect their storyline.
The table shows each character’s most commonly used word in each season.
The most commonly spoken word by each character in each season they appeared in.
Again, this is not including stop-words so although it is the “most common”, some of them may only have been used a handful of times in each season.
The ‘Character’ column has been highlighted for ease of interpretation.
The more eagle eyed of you will have spotted that although Arya’s most common word was ‘kill’ (see the first table in this article), it wasn’t her most common for any particular season.
Evidently ‘kill’ was more of a consistent running theme throughout the whole of Arya’s story, rather than being at the forefront of her vocabulary for any one season.
I think Daenerys’ words give quite a good indication of what her story has been so far.
Ramsay in Season 4: this is an example of where an anomaly can sway the data, this one coming from a single scene in episode 2 – ‘Tansy’ is a serving girl who Ramsay hunts and kills with his dogs.
During the hunt he repeatedly shouts ‘Tansy, Tansy, Tansy” which leads to it being his most commonly used word for the season.
Up next…Hopefully you enjoyed this short piece of analysis and found it as interesting as I did!.For my next post I’m going to be using the same dataset and performing some sentiment analysis to see how the characters’ emotions throughout the show can be mapped by the words they say.
I’m also going to consider developing a web app for this dataset.
As I mentioned there are over 400 characters and nearly 300,000 words so I’ve barely scratched the surface here.
If you’d be interested in a web app that allows you to explore and visualise this data in multiple ways then let me know and I’ll keep you updated on any relevant plans I have!Thanks for reading, leave a comment and share with anyone you think might find this interesting!.#winterhascomeP.
I might have missed out an important character in the ‘favourite word by season’ table above….