Unless you live under a rock, you've probably seen news reports that Russian trolls have been posting on social media to allegedly conduct "what they called information warfare against the United States, with the stated goal of spreading distrust toward the candidates and the political system in general," according to US Deputy Attorney General Rod Rosenstein. NBC recently made available 200,000+ tweets from troll accounts linked to the Russian Internet Research Association, and I thought it might be interesting to analyze that data with some graphs!
First, I went to the NBC article and followed the link to download the raw data (in the form of a CSV file). The text file has several fields in each line, and the text of the tweets makes the lines somewhat long ... making it difficult to study the 200,000+ lines of data in its raw form.
Therefore, I imported the data into SAS®, where it is much easier to manage. Note that the data was a little tricky, because in addition to the line feed at the end of each data line, the text field could also contain one or more line feeds. So instead of using just the traditional PROC IMPORT, I enlisted the help of Rick Langston, who wrote some custom code in a DATA step to convert the line feeds in the text field to a "/" character before importing the data. Here's a link to the SAS code, if you'd like to see it.
Timing of Troll Tweets
For my first graph, I wanted to see when the tweets were posted. I recycled some custom code I had used to plot President Trump's tweets in a previous blog post, and quickly had a nice timeline of the Russian trolls' tweets, showing what year and month they were posted (horizontal axis) and the time of day (vertical axis). It looks like most of them were posted mid-2016 to early 2017, and during that time the trolls were posting 24 hours a day.
Most Active Trolls
200,000+ tweets is a lot of data to try to wrap your head around. Therefore I decided to subset the data for my next graph, and just focus on the troll accounts that had posted at least 1,000 tweets. I used PROC SQL to create some summary counts, and came up with a more manageable subset. Each circle marker represents one tweet, and I colored the circles red for tweets that had been marked as retweeted or favorited (liked).
Most Popular Tweets
For my final graphs, I drilled down to the individual tweet level. These next two graphs show how many times the most popular tweets were retweeted or liked. I know, I know ... these two graphs aren't the most visually pleasing (too many colors, and the tweet ID numbers are a bit overbearing), but I couldn't think of another way to show this information (maybe you've got an idea or two you can share with me in the comments?).
The graphs above are just static images, but if you click them you can see the interactive versions with HTML mouse-over text. The second and third graphs even allow you to drill down to the archived pages for these Twitter users. (Twitter has deleted their accounts, but the Wayback Machine saved some snapshots of some of the users' Twitter pages.)
Warning: Drill down to the individual tweets (in the two bar charts) at your own risk! If you're like me, you are curious to see exactly what kind of stuff the Russian trolls were posting in their tweets. But keep in mind that their goal is to "spread distrust."
Note: These 200,000+ tweets are probably just a small subset of the social media posts made by Russian trolls, but hopefully it gives a good snapshot into what they were up to, and what to be on the lookout for.
Finally, I think the following quote sums things up quite nicely:
"Don't believe everything you read on the internet." - Abraham Lincoln