One of our clients is exploring the concept of data as a service as an additional option for providing data accessibility to different business functions in the enterprise. But when it comes to data as a service as an architectural pattern, I believe there is some confusion as to what it constitutes – and how the concept is envisioned, designed, architected and put into production.
First, let’s consider three motivating factors for this client:
- Cloud storage. There's a push for the organization to migrate their data and computation to the cloud. Some projects in the organization have already adopted cloud-based storage as their preferred mode for data persistence, and it appears that other projects will follow suit. A quick note here: given the right design and data layout, there seem to be some clear economic benefits of using cloud storage. These are hard to ignore when considering what might be deemed a drastic architectural change.
- Cost management for compute resources. There is growing affinity for virtualized computing resources managed in the cloud as well. Managing hardware within their own data center continues to impose ongoing operations costs that they believe can be reduced or eliminated as they migrate applications to cloud-based computing systems.
- Legacy modernization. There are a number of decades-old systems that are mainframe-bound, and there's a push to move off the mainframe and migrate historical data away from COBOL files to a more easily accessed data platform.
Sound familiar? We've seen this pattern at a number of client organizations over the past few years. Indeed, there are obvious additional benefits of combining migration to virtual cloud-based resources with modernization of old-fashioned data files. From a practical standpoint, it makes a lot of sense. The lowered costs of big data management and analytics are, it seems, coercing organizations to expand data reuse – with the expectation that doing so will lower overall costs and level of effort for ongoing and newly developed systems.
The challenge arises when considering the models for data management once legacy data has been migrated to the cloud environment. Existing data consumers are accustomed to their simple data extract model: send a request to the mainframe system, request a data extract, then wait a few weeks until a job is configured, run, and the data extract file is produced and transmitted.
But this traditional model is less suited to a world that moves rapidly. Analysts that have been trained using end-user access and visualization tools have much less patience. While they might be willing to wait for a configured query to run and return within a much shorter time frame, the newer application development methodologies expect simpler and faster access. This is where the concept of data as a service comes in – trying to facilitate data accessibility by simplifying the methods through which information is accessed using a narrowed framework of API interfaces.
This suggests a simple example: providing access to legacy data loaded onto a cloud-based storage system. If you simply copied the data sets from your on-premises system and dumped them into a cloud-based data lake, you would not have improved data usability. But if you were to reconfigure the legacy data by transforming the individual data instances into objects using JSON or XML, and then organize those objects based on common access criteria (such as “location”), you could configure a REST API that could easily support accesses by the selected criteria. In turn, more complex access patterns can be engineered to align with the REST paradigm, which essentially allows you to continually build out and publish service capabilities.
It is this last point that, in my opinion, is the essence of data as a service. That is, the ability to continually deploy new services without disrupting existing applications, all geared toward expanding accessibility and data reuse.
Download a TDWI best practices report about data lakes