Twitter is a big data problem

Filtering large data volumesBack in my day-one post of this "HPA once a day project," I promised a post about Twitter as "big data."

I know some of you are already moaning about the noise on Twitter and the "what I ate for breakfast" type of commentary that's prevalent there. So I'm going to promise right off the bat not to give you a perky "Twitter is whatever you make it" type of response. No, the fact is, Twitter is noisy. And it's often irrelevant. And distracting. And sometimes a whole lot of fluff. I'll admit all of that up front.

In fact, if I had to estimate, I could feel pretty confident saying that way less than 1 percent of the content there is relevant to your business. Right? I doubt anybody is going to argue with me on that.

But I am going to challenge you to think of Twitter - and other social content - differently when it comes to your business. Instead of seeing these sites as information sources or social conversation sites, let's think of them as event streams.

Let me explain.

A lot of smart companies are already using complex search taxonomies and semantic technologies to pull relevant nuggets of information out of Twitter and other social sites. Primarily, they are using this information for customer support and product development.

The smartest companies are not just looking at Tweets and comments about their companies, though. They're looking at conversations about their industry and the broader needs of potential customers.

To do that, you need a very robust strategy for the way you're sifting through all of the information out there on the Web. To do it effectively, you need to sift through the information, find the relevant nuggets, combine it with other data sources and act on it faster than your competitors.

That process of sifting through large exabytes of data in real time to determine data relevance and combine like data together for analysis? That's complex event processing. And that's a big data problem.

SAS CTO Keith Collins talked about the information flow challenge in a post last year where he referred to public safety intelligence  as an example. I challenge you to apply that same type of thinking to your Web monitoring.

If you did not have constraints on the amount of data, the speed of analysis or the complexity of the search algorithm, what could your company do with social media data? How could you use it as a source of intelligence about your competitors, your marketplace and the broader needs of customers that still need to be solved?

These are the types of problems you can solve with high performance analytics. And that's why we're all really excited about HPA.

This is day four of my "HPA once a day" blog post series. To read more, see all of the high-performance analytics posts on this blog or follow the high-performance analytics rss feed.

tags: big data, high-performance analytics, social media analytics


  1. Posted March 2, 2012 at 9:09 am | Permalink

    Once you understand to follow the hashtags (#sastip, #sasuser) then Twitter becomes a lot more relevant. My own usage increased when I realized there was a conversation about #bi, #dashboard, and #analytics. I found many more people that were tweeting about topics I actually did find useful and articles I did want to read. I discovered All Analytics that way ... which was a good find and few others.

    In this blog post, I talk about how to add the Twitter hashtag stream to your SAS BI Portal - but I think you could also use similar code to collect the "conversation" for better analysis.

    • Alison Bolen Alison Bolen
      Posted March 2, 2012 at 9:21 am | Permalink

      Hi Tricia. I agree that on the level of personal use, Twitter is not a big data problem at all, especially once you discover how to organize things by hashtags and learn a few other tricks. It does become a big data problem, though, when you're looking at the whole universe of tweets from the perspective of a large company and trying to decide which of those billions of bits of information might contain competitive insight not just about your company but also about your market. That's when the billions of tweets go from noise to becoming a worldwide information stream that is worth processing and analyzing.

  2. Posted March 5, 2012 at 2:18 pm | Permalink

    Twitter is a very noisy area but if you approach it with a strategy you can find some great information and even release some great information if you have something relevant to say. Sometimes it might feel like a numbers game depending on which side of the marketing wall you are on.

One Trackback

  1. [...] Twitter is a big data problem [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>