In my last post, I shared some thoughts related to the results of a research report I worked on about moving Hadoop into production. In that post, I came to the conclusion that there are many challenges associated with fully integrating Hadoop as part of an enterprise infrastructure.
Many organizations are excited about the prospects of Hadoop. In particular, they want to be able to call on the Hadoop ecosystem when they need to use advanced analytics methods that can consume both structured and unstructured data, process the information contained within, and provide real-time guidance in operational environments. Yet it's unlikely that any organization would rip out its existing business intelligence and analytics infrastructure and replace it with a relatively new technology like Hadoop.
Hadoop – A gradual transition
First, even as the tools accessible via the Hadoop stack are maturing, there are still aspects of conventional data warehousing that it cannot replicate. Second, unless there are clear returns on the investment in time and the cost of a new Hadoop platform, migrating a working application from one platform to Hadoop would not make sense unless that application can be improved, or unless there is a recognized reduction in cost to operate.
This means that for the near to medium-term, there will be a gradual adoption and integration of frameworks like Hadoop (or insert any of your favorite new technologies here). Over time, some of the conventional systems will be retired. Alternatively, it may turn out that those older systems continue to perform as expected, or the system may continue to operate in a cost-effective manner. And there could be some other reason why the system is deemed necessary to stay in the enterprise. In other words, for the next five to ten years, many organizations will operate with a hybrid environment that spans radically different technology philosophies.
Three things to consider
So what does this really mean from a practical perspective? Here are three obvious considerations for ensuring that heritage systems can operate with newer technologies.
- Backwards compatibility for interoperability. Information architectures will include a wide variety of different types of data management systems including traditional structured data sets, NoSQL data, and unstructured data in a variety of different forms (such as free text or streaming audio). That means adding text analytics, data streaming and real-time analytics into the hybrid environment in a way that interoperates with legacy system designs.
- Federation and virtualization to bridge data access. Organizations will increasingly rely on shared storage platforms for their information management needs, including cloud storage. At the same time, applications will need to access data from multiple systems yet remain transparent to the end-user, necessitating data federation and virtualization as a key component of the enterprise architecture.
- Data curation for data assets in the data lake. Because analysts and data scientists will increasingly demand access to data in its raw form for their advanced analytics, they will need access to curated data lakes with data asset catalogs that convey semantic information about each data asset’s contents.