I've written before on my own site about the difficulty in defining Big Data. Long story short, precisely defining the term isn’t terribly easy to do. Over the past few months, I have been extensively researching Big Data. It's been a fun ride and I have discovered that it's much easier to list the characteristics of Big Data rather than try to settle on a "perfect" definition.
So, in this post, I'll discuss a few of my findings and observations. Ironically, I have no data on the first two assertions. Call them hunches.
1. An organization that keeps its Small Data in respectable shape seems more likely to embrace Big Data. Those mismanaging their structured, transactional data typically don't want to add more data to the mix. While I understand this, I don't agree with it.
2. Plenty of organizations just don't get it. They think of Big Data as a big waste of time. This is unfortunate; there is tremendous value in Big Data.
Now, onto things about which I am completely sure:
1. Big Data can result in big abuses (read: privacy and security). Color me an optimist, but I like to believe that in business foul play is the exception, not the rule. Still, it’s naive to think that companies aren't using data in ways that would make you cringe. In the case of Facebook, if you're not being charged, then you are the product.
2. Yes, it is possible to use traditional RDBMSs with limited amounts of unstructured data. I am finding, though, that most progressive folks recognize the inherent limitations of SQL and long tables with many rows. Tools like Hadoop and columnar databases are better suited for unstructured and semi-structured data.
3. Data ownership is going to become an even bigger issue than it is now. Bank on it.
Big Data is a big topic and, over the next few months, I suspect I'll be visiting it on this site pretty often. We're out of the very early innings, but much has to play out. Most organizations have yet to do very much with Big Data and I hope to provide some guidance.
What say you?