Historically, before data was managed it was moved to a central location. For a long time that central location was the staging area for an enterprise data warehouse (EDW). While EDWs and their staging areas are still in use – especially for structured, transactional and internally generated data – big data has given rise to another central location, namely the data lake. As a storage repository for vast amounts of raw data in its native format, a data lake often contains data that is semistructured or unstructured, nontransactional or event-driven, and externally generated.
Centralizing data before it’s managed has its advantages. The main benefit is that there’s only one place the enterprise has to build and maintain the processes for cleansing, unduplicating, transforming and structuring data. Of course, this approach is based on the assumption that most of the data business users consume will be sourced from this data management hub. But these days business users have access to an abundance of alternative data sources, both within and outside of the enterprise.
How managing data where it lives can help with governance, quality and more
The alternative to moving data to a data management hub for processing involves building services around where the data lives (e.g., in-cloud, in-database, in-memory, in-stream). In the past, this raised legitimate concerns about application silos being created because of tightly coupling the data with the data management processes built around it. But over the last decade the data management software industry has largely shifted its paradigm to a service-oriented, build-it-once-reuse-it-everywhere delivery and deployment model for data management processes. Creating data management processes once and reusing them gives the enterprise a standard, repeatable method for managing data.
When data does not have to be moved before it's managed, the data continues to be available at the same place consumers are accustomed to accessing it. This not only makes improved data more accessible to business users – it also eliminates the overhead of training them on how to use new applications and interfaces. Another fundamental issue with moving data to another location to perform data management tasks is the disconnect it creates between sourced data and managed data. And since governance and management go hand in hand, it also creates a disconnect between sourced and governed data.
When data stays put, minimizing the number of places where it must be managed and governed, it becomes easier to enact and enforce data governance policies and procedures. Key data governance issues like establishing and sustaining the right level of data quality, privacy and security are often a necessary disruption to daily business activities. But data governance becomes less disruptive when governance happens at home, so to speak – that is, when governance works where data lives. The bottom line: Managing data where it lives with in-data processing minimizes data movement, improves data quality and sustains data governance.