Twitter is a big data problem


Filtering large data volumesBack in my day-one post of this "HPA once a day project," I promised a post about Twitter as "big data."

I know some of you are already moaning about the noise on Twitter and the "what I ate for breakfast" type of commentary that's prevalent there. So I'm going to promise right off the bat not to give you a perky "Twitter is whatever you make it" type of response. No, the fact is, Twitter is noisy. And it's often irrelevant. And distracting. And sometimes a whole lot of fluff. I'll admit all of that up front.

In fact, if I had to estimate, I could feel pretty confident saying that way less than 1 percent of the content there is relevant to your business. Right? I doubt anybody is going to argue with me on that.

But I am going to challenge you to think of Twitter - and other social content - differently when it comes to your business. Instead of seeing these sites as information sources or social conversation sites, let's think of them as event streams.

Let me explain.

A lot of smart companies are already using complex search taxonomies and semantic technologies to pull relevant nuggets of information out of Twitter and other social sites. Primarily, they are using this information for customer support and product development.

The smartest companies are not just looking at Tweets and comments about their companies, though. They're looking at conversations about their industry and the broader needs of potential customers.

To do that, you need a very robust strategy for the way you're sifting through all of the information out there on the Web. To do it effectively, you need to sift through the information, find the relevant nuggets, combine it with other data sources and act on it faster than your competitors.

That process of sifting through large exabytes of data in real time to determine data relevance and combine like data together for analysis? That's complex event processing. And that's a big data problem.

SAS CTO Keith Collins talked about the information flow challenge in a post last year where he referred to public safety intelligence  as an example. I challenge you to apply that same type of thinking to your Web monitoring.

If you did not have constraints on the amount of data, the speed of analysis or the complexity of the search algorithm, what could your company do with social media data? How could you use it as a source of intelligence about your competitors, your marketplace and the broader needs of customers that still need to be solved?

These are the types of problems you can solve with high performance analytics. And that's why we're all really excited about HPA.

This is day four of my "HPA once a day" blog post series. To read more, see all of the high-performance analytics posts on this blog or follow the high-performance analytics rss feed.


About Author

Alison Bolen

Editor of Blogs and Social Content

+Alison Bolen is an editor at SAS, where she writes and edits content about analytics and emerging topics. Since starting at SAS in 1999, Alison has edited print publications, Web sites, e-newsletters, customer success stories and blogs. She has a bachelor’s degree in magazine journalism from Ohio University and a master’s degree in technical writing from North Carolina State University.


  1. Once you understand to follow the hashtags (#sastip, #sasuser) then Twitter becomes a lot more relevant. My own usage increased when I realized there was a conversation about #bi, #dashboard, and #analytics. I found many more people that were tweeting about topics I actually did find useful and articles I did want to read. I discovered All Analytics that way ... which was a good find and few others.

    In this blog post, I talk about how to add the Twitter hashtag stream to your SAS BI Portal - but I think you could also use similar code to collect the "conversation" for better analysis.

    • Alison Bolen
      Alison Bolen on

      Hi Tricia. I agree that on the level of personal use, Twitter is not a big data problem at all, especially once you discover how to organize things by hashtags and learn a few other tricks. It does become a big data problem, though, when you're looking at the whole universe of tweets from the perspective of a large company and trying to decide which of those billions of bits of information might contain competitive insight not just about your company but also about your market. That's when the billions of tweets go from noise to becoming a worldwide information stream that is worth processing and analyzing.

  2. Twitter is a very noisy area but if you approach it with a strategy you can find some great information and even release some great information if you have something relevant to say. Sometimes it might feel like a numbers game depending on which side of the marketing wall you are on.

  3. Pingback: Big data answers for your industry and your role - SAS Voices

  4. Hi Alison. I love the graph so much in this article and I wonder if I could use it as a big data problem example in my project presentation with this blog address attached? I'm a student studying IT at the University of Melbourne, this presentation is for a report on the comparison of Hadoop, Spark and Flink and the subject is named Advanced Database.

Leave A Reply

Back to Top