I've seen a number of articles and webinars recently that discuss data integration as a cloud-based service. So I thought it was worth exploring what this really means in the context of big data – specifically when the objective is to exploit many sources of streaming data for analytics. My initial reaction
Author
There is no doubt about it – over the past few years there has been a monumental shift in how we think about “enterprise” data management. I believe this shift has been motivated by four factors: Open data. What may have been triggered by demands for governmental transparency and the need
Many people perceive big data management technologies as a “cure-all” for their analytics needs. But I would be surprised if any organization that has invested in developing a conventional data warehouse – even on a small scale – would completely rip that data warehouse out and immediately replace it with an NoSQL
I am currently cycling through a schema-on-read data modeling process on a specific task for one of my clients. I have been presented with a data set and have been asked to consider how that data can be best analyzed using a graph-based data management system. My process is to
In my prior two posts, I explored some of the issues associated with data integration for big data and particularly, the conceptual data lake in which source data sets are accumulated and stored, awaiting access from interested data consumers. One of the distinctive features of this approach is the transition
In my last post, I noted that the flexibility provided by the concept of the schema-on-read paradigm that is typical of a data lake had to be tempered with the use of a metadata repository so that anyone wanting to use that data could figure out what was really in
A few of our clients are exploring the use of a data lake as both a landing pad and a repository for collection of enterprise data sets. However, after probing a little bit about what they expected to do with this data lake, I found that the simple use of
The data governance “industry” thrives on a curious dichotomy. On the one hand, some service providers insist to clients that they need a data governance program, that they must create a data governance council and that they should immediately staff a collection of roles ranging from data governance council member
Operationalizing data governance means putting processes and tools in place for defining, enforcing and reporting on compliance with data quality and validation standards. There is a life cycle associated with a data policy, which is typically motivated by an externally mandated business policy or expectation, such as regulatory compliance.
In recent years, we practitioners in the data management world have been pretty quick to conflate “data governance” with “data quality” and “metadata.” Many tools marketed under "data governance" have emerged – yet when you inspect their capabilities, you see that in many ways these tools largely encompass data validation and data standardization. Unfortunately, we