Less than a year ago, the country’s attention was on Dallas after the first Ebola patient died. Not only is that where this project begins, but also where it was presented at SAS Global Forum in Dallas.
Sharat Dwibhasi and his classmates Dheerj Jami and Shivkanth Lanka from Oklahoma State University analyzed the sentiment of the Ebola outbreak using tweets.
Their research involved extracting the live streaming data from Twitter over a four month period and studying the pattern based on the Ebola timeframe. They used SAS Enterprise Miner and SAS Sentiment Analysis Studio to evaluate the following:
- How seriously are people taking the outbreak
- The geographical areas where people are most concerned
- Percentage of tweets which emphasize awareness
Collecting the Data
The first step was extracting the data from Twitter by accessing the live stream API of Twitter using the tweepy package in Python. They started collecting tweets after the first patient died in the U.S. “We collected tweets from Oct. 8, 2014 to Feb. 15, 2015,” said Dwibhasi. “That was a big part of the project.” The students collected around 270,000 English language tweets and divided them into three datasets which helped them categorize and compare the change in moods.
Next, SAS was used to clean and analyze the data. For the analysis, the students used appropriate NLP techniques, lemmatization, concept linking and use of synonyms.
What they discovered is that initially people were worried about catching the disease. Over time, the sentiment changed to caring for those in areas where Ebola was most prevalent. And finally, people started feeling an appreciation for Ebola workers, as well as finding a cure.
They also discovered that positive tweets about Ebola were retweeted more than the negative tweets.
For more details, here’s a link to the paper, Analyzing and visualizing the sentiment of the Ebola outbreak via tweets.