Are you a fan of the television series Game of Thrones or text mining? If your answer is yes and/or yes – then this blog is perfect for you. And don’t worry – I won’t reveal any spoilers.
Brad Gross and Srividhya Naraharirao came up with the idea to do a text analysis of the book series, “A Song of Ice and Fire” by George RR Martin (that’s the book that led to the hit HBO series Game of Thrones). They presented their findings at the Student Symposium at the 2015 SCSUG Educational Forum at Louisiana State University.
Gross and Naraharirao are both pursuing their Master’s degree in Analytics at LSU, and this project gave them their first chance to learn more about text analysis.
“We found a SAS article on text analysis and thought we could change this to something more relevant to us,” said Gross. “We went in with no knowledge, but once we started to see what SAS was capable of we started to see what we could accomplish using it and that drove us down our path.”
The students set out to determine narrator traits based on common text clusters and factor analysis, character qualities based on common words used and relationship strength based on interactions.
The analysis was done using SAS Text Miner nodes including text import, text parsing, text filter, text profile, text cluster, and text topic for pattern discovery. They also used SAS Enterprise Miner nodes like filter, data partition, metadata, regression, and save data for data processing and predictive modeling.
“About 60 percent of the time was spent in data preparation,” said Naraharirao. “SAS Enterprise Miner was so user friendly that it was easy for us. The analysis was so intuitive in Enterprise Miner.”
They both said that they learned a lot from completing this project and getting the opportunity to present it to a room full of students and professionals.
“You think you know a story very well, but the way the text analysis starts to make predictions you see things you had never really picked up before,” said Gross.
Want to dive deeper into their analysis and insights? I posted the full presentation in the SAS text mining community. But I must warn you – presentation may contain spoilers.