Data on demand or data as a service?


Businessman considers data as a serviceOne of our clients is exploring the concept of data as a service as an additional option for providing data accessibility to different business functions in the enterprise. But when it comes to data as a service as an architectural pattern, I believe there is some confusion as to what it constitutes – and how the concept is envisioned, designed, architected and put into production.

First, let’s consider three motivating factors for this client:

  • Cloud storage. There's a push for the organization to migrate their data and computation to the cloud. Some projects in the organization have already adopted cloud-based storage as their preferred mode for data persistence, and it appears that other projects will follow suit. A quick note here: given the right design and data layout, there seem to be some clear economic benefits of using cloud storage. These are hard to ignore when considering what might be deemed a drastic architectural change.
  • Cost management for compute resources. There is growing affinity for virtualized computing resources managed in the cloud as well. Managing hardware within their own data center continues to impose ongoing operations costs that they believe can be reduced or eliminated as they migrate applications to cloud-based computing systems.
  • Legacy modernization. There are a number of decades-old systems that are mainframe-bound, and there's a push to move off the mainframe and migrate historical data away from COBOL files to a more easily accessed data platform.

Sound familiar? We've seen this pattern at a number of client organizations over the past few years. Indeed, there are obvious additional benefits of combining migration to virtual cloud-based resources with modernization of old-fashioned data files. From a practical standpoint, it makes a lot of sense. The lowered costs of big data management and analytics are, it seems, coercing organizations to expand data reuse – with the expectation that doing so will lower overall costs and level of effort for ongoing and newly developed systems.

The challenge arises when considering the models for data management once legacy data has been migrated to the cloud environment. Existing data consumers are accustomed to their simple data extract model: send a request to the mainframe system, request a data extract, then wait a few weeks until a job is configured, run, and the data extract file is produced and transmitted.

But this traditional model is less suited to a world that moves rapidly. Analysts that have been trained using end-user access and visualization tools have much less patience. While they might be willing to wait for a configured query to run and return within a much shorter time frame, the newer application development methodologies expect simpler and faster access. This is where the concept of data as a service comes in – trying to facilitate data accessibility by simplifying the methods through which information is accessed using a narrowed framework of API interfaces.

This suggests a simple example: providing access to legacy data loaded onto a cloud-based storage system. If you simply copied the data sets from your on-premises system and dumped them into a cloud-based data lake, you would not have improved data usability. But if you were to reconfigure the legacy data by transforming the individual data instances into objects using JSON or XML, and then organize those objects based on common access criteria (such as “location”), you could configure a REST API that could easily support accesses by the selected criteria. In turn, more complex access patterns can be engineered to align with the REST paradigm, which essentially allows you to continually build out and publish service capabilities.

It is this last point that, in my opinion, is the essence of data as a service. That is, the ability to continually deploy new services without disrupting existing applications, all geared toward expanding accessibility and data reuse.

Download a TDWI best practices report about data lakes

About Author

David Loshin

President, Knowledge Integrity, Inc.

David Loshin, president of Knowledge Integrity, Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. David is a prolific author regarding data management best practices, via the expert channel at and numerous books, white papers, and web seminars on a variety of data management best practices. His book, Business Intelligence: The Savvy Manager’s Guide (June 2003) has been hailed as a resource allowing readers to “gain an understanding of business intelligence, business management disciplines, data warehousing and how all of the pieces work together.” His book, Master Data Management, has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at . David is also the author of The Practitioner’s Guide to Data Quality Improvement. He can be reached at

Related Posts

Leave A Reply

Back to Top