Recently, TDWI released "Managing Big Data," a report that explored trends in big data management (BDM). The author, Philip Russom, is an expert in the fields of data warehousing and data management, and for this report he surveyed more than 400 practitioners about their big data efforts.
One thing immediately evident is that the phrase "big data," in the minds of many surveyed, is analogous to Hadoop. According to the survey, Russom found that Hadoop and its related components (the Hadoop Distributed File System, or HDFS, as well as MapReduce and other components) are considered to be "the software products most aggressively adopted for BDM in the next three years."
The adoption of Hadoop and related big data systems isn't a surprise, but its rise to prominence has been swift. In just eight years, Hadoop went from a Yahoo creation to a "do anything" file structure and data repository for all types of information. The TDWI survey gave some insight into why and how people are implementing big data, and Hadoop in particular.
Respondents were asked which database management systems (DBMS) were in use for big data management efforts. While traditional relational DBMS systems were at the top, with 38 percent saying they were the primary systems, Hadoop has pulled even with data appliances at 33 percent apiece (multiple responses were allowed).
Digging into the data a little further, you can see more interesting Hadoop stats. The HDFS file system is in use by 21 percent of those surveyed, with an additional 43 percent expected to move some data to HDFS in the next three years. Similarly, 16 percent of those surveyed are using Hadoop tools besides HDFS, with another 44 percent planning to employ them in the next three years.
For data management professionals, where does the shift to Hadoop, NoSQL DMBS and related big data structures leave us? It's easy to see the oncoming torrent of data and start to panic. Luckily, Russom gives some great guidance about 10 priorities of BDM - see page 35 of the report.
Russom's first point is quite obvious: you have to demand business value when starting a big data initiative. This is important because big data efforts have grown organically, if not chaotically. For many organizations, the goal has been to aggregate information, but reaching some sort of decision or changing a business process was second or third on the priority list.
Russom's advice here is sound. And necessary. You need to demonstrate business value for any IT initiative, whether it's big data or not. Just because you can store massive amounts of data doesn't mean that you will necessarily get a return on it. Storage is now cheap, but answers are always at a premium.
Finding business value is critical because data management helps make sure that data can enhance and improve the business. If you're a data analyst or a data steward, a big data effort can feel like a boulder rolling downhill, and you're tasked with slowing down – or at least cleaning up – that boulder while it's accelerating.
However, there is some good news in the report. It sounds like many of us in the industry have already started to view big data as something other than a "dumping ground." The TDWI survey found that 89 percent of survey respondents said that "BDM is an opportunity - but only if you seize it." This is great to hear, as data management within the big data world, has not been a primary topic thus far.
So, if you're organization is heading down the big data path – and you suddenly start to see the word "Hadoop" in many of your meeting requests – take heart. You're not alone. Start asking questions about the need for the effort and how it can change the business. And rely on the data management strategies you already have – and employ them on Hadoop or whatever big data structure are underway. That can take BDM from a theory into a reality.