In the previous post, I described how, at least to some extent, organizations can use data to build moats around their competition. I focused on the business side of the table, although I did touch upon the need to adopt new tools to make sense of what we affectionately call big data.
Today, I'll get a bit more technical.
I'll be the first to admit that the question in the title of this post is a tad misleading. For one, it implies that organizations must use Hadoop for the purposes of storing, analyzing and acting upon semistructured and unstructured data.
But is that really true?
The relationship between tools and data strategy
Not really. There are scores of powerful Hadoop alternatives, with more arriving frequently. In March of 2013, Wiley published my fifth book Too Big to Ignore. In the span of less than three years, the import of big data has only increased – as has the number of viable tools around it. New technologies such as Spark and Presto simply weren't on the radars of most big-data types back then.
It's critical to evaluate different tools, however, by more than claims about their sheer processing power and their delivered functionality. Just as important (if not more so these days), organizations have to ask themselves about these tools' development communities and user bases – collectively termed ecosystems these days. And Hadoop's ecosystem is certainly impressive in its depth and breadth.
Still, big data is much more akin to Android than iOS. That is, it's wide open, not unlike the Wild Wild West – and that's not likely to change anytime soon. Because of Hadoop's open-source nature, anyone can fork it. (Quantcast is a case in point.)
The questions are more important than the answers
Based on my research, intelligent organizations today don't frame the discussion around big data specifically and exclusively around tools such as Hadoop. You're not likely to hear the following in a meeting at Neflix:
Executive #1: Item #1 on the agenda. Are we using Hadoop?
Executive #2: Yes.
Executive #1: Phew! I was worried. Good! On to item #2.
Rather, you're more likely to hear the following:
Executive #1: We need to make sense out of an increasing number of data sources. We doing this now, but how can we do this better today and in the future? What are we not capturing? What do we not know? And how can we visualize this information so that everyone will understand it and ask better questions?
Simon Says: Hadoop is a means to an end.
Hadoop may be the elephant in the room (pun intended), but its presence alone does not guarantee meaningful business results. By the same token, it's essential for the increasing number of organizations that evidently "get" big data to move beyond relational databases, standard reports, KPIs and the like.
Hadoop – or whatever else an organization has deployed – is a means to an end. It can certainly help, but it guarantees nothing if icky things like cultural resistance to data stand in its way.
And those pithy statements are the best place to begin a discussion over building an effective data strategy.
Feedback
What say you?
I hope that you enjoyed this series.