Managing big data, Part 1: Technical questions

My, have times changed.

When it comes to making sense out of data, there have never been more things to think about. Put differently, the notions of data and managing it have changed considerably over the past five years, a point that I make in Too Big to Ignore.

In this two-part series, I'll provide a very simple framework for understanding and succeeding in this new era.

A very brief history of traditional data management

Go back 25 years. Many organizations followed essentially the same "on-premise" data-management playbook as the rest. Starting in the early 1990s, "terminals" and mainframe databases started to give way to distributed computing and client-server architectures. Back-office and ERP systems from vendors such as SAP, PeopleSoft, Oracle and others made it easier for rank-and-file employees to input and report on key enterprise data from different locations and even different countries.

Today, massive, multimillion-dollar on-premise system implementations are few and far between. Thanks to the rises of cloud computing, its cousin SaaS and open-source software, the game has changed in many instances. Companies can deploy far lighter applications such as Workday, Salesforce and others in a fraction of the time and cost.

The upending of contemporary data management

Despite these massive technological changes, the job of "managing" enterprise data continues to fall on database administrators (DBAs) in many if not most large, mature organizations. (I'll leave aside startups and small businesses here.) Also, some companies outsource data management and recovery to "experts."

For all of their power, relational databases such as Oracle, SQL Server and DB2, simply weren't built to house – much less interpret – the types of data being generated today. Traditional data marts and data warehouses can quickly fill up thanks to massive increases in the sheer size and volume of data. Fortunately, data storage has never been cheaper. But these days there are far more technical questions than answers:

The rise of cloud computing means that organizations face more data-oriented options than ever, but some of these services are of questionable value. How do you select the "right" one without getting hoodwinked?
Should we buy, build, rent or steal (re: go the open-source route)?
If a relational database isn't the answer for "managing big data," then what is? Is it Hadoop? A graph database? Another NoSQL "solution?" All of the above? A combination? Neither?
What are the new security challenges? What about BYOD?
How does our organization integrate the new stuff with the old stuff? Is one-stop shopping even possible?
Yeah, speed matters, but what constitutes acceptable latency?

Simon Says

I could go on with quasi-technical queries such as these. Brass tacks: the arrival of big data means that there are far more questions than answers.

How to be successful? Realize two things: First, it's far better to ask them before starting on the journey. Second, hindsight remains 20/20. You will learn along the way.