Most folks who know me, know I'm a bit of a Star Wars geek. I've analyzed the original trilogy scripts and documented my findings in a paper called Star Wars and the Art of Data Science. I'm always looking for excuses to get my hands into Star Wars data, and May the 4th is a great annual excuse!
Now that "The Force Awakens" has been around for almost half a year, I thought I'd take the opportunity to delve into it. Many aficionados have claimed that "The Force Awakens" is nothing more than a rehash of "A New Hope." What do I think? I think there are some striking similarities. Desert planet, droids, budding Jedi, super-weapon, battles, rebels ... it's a formula that worked the first time, and, I have to admit, I liked it the second time around as well!
I thought it would be fun to use a little text analytics and visualization to see if the scripts themselves are thematically similar. Sometimes letting the data tell the story can be more eye opening than conjecture, plus it gives you a nice stake in the ground if you want to get into a heated debate! These examples are not all-encompassing, and are just meant as a bit of a teaser!
I have to give a shout out to my husband, Adam Maness, from the SAS Data Management practice, for whipping the data into shape. Adam is a true data Jedi. He did all of the data work for the paper I linked to above, and used those same processes to ingest "The Force Awakens script." I downloaded the scripts from the Internet Movie Script Database.
For this quick analysis, I decided to pull the data into SAS Contextual Analysis. I thought, since I wanted to compare "A New Hope" and "The Force Awakens," it would be fun to look at topic clusters and see if there were any overlaps. Spoiler alert ... there are totally overlaps!
This first screenshot shows the topic clusters that were pulled out of "A New Hope." If you're a fan, none of these should be a surprise. We have a cluster about droids, followed by a cluster with Han and Chewie, followed by a little antagonist/protagonist action, and the list goes on.
Here's a screenshot showing the topic clusters for "The Force Awakens:"
"The Force Awakens" clusters also include a droid cluster, a Han and Chewie cluster, and an antagonist/protagonist Kylo Ren and Rey cluster. If we look at the fourth cluster for "A New Hope," there's mention of the death star, the movie's planet-destroying super weapon. The sixth cluster for "The Force Awakens" mentions an oscillator, which is another planet-destroying super weapon. The oscillator cluster is also a cluster that contains pilots. "A New Hope" also has a pilot cluster that mentions the red and gold leader. Finally, the last cluster in both screenshots contain people that help fuel an untrained Jedi's quest for knowledge.
I thought it would be fun to look at a couple of word clouds in SAS Visual Analytics as well. Here's a word cloud from the beginning of "The Force Awakens" that deals with a crash on a desert planet with sand and dunes (Jakku).
...and a word cloud from "A New Hope" that mentions the Sandpeople and the desert on Tattooine.
What do you think? Did you love The Force Awakens because it felt a bit nostalgic? I'll let you decide, and feel free to weigh in in the comments! One thing is for certain, using analytics on Star Wars data is always fun. Consider this a first pass at this topic. I feel a disturbance in the Force that indicates a sequel. Stay tuned for more, and until then, may the Force be with you!