In a recent post, we discussed a number of factors that have expanded the reach of organizations' data management functions beyond the traditional scope of an in-house data center. Increased use of external data sources coupled with growing dependence on cloud computing have created an emerging need to exercise some level of control over use of that data.
Just to clarify: the reason I talk about control over the use of data instead of data management is that we have to differentiate between those times when we can exercise rights of management and those times when our use of a data set is a byproduct of its availability. For example, your organization can contract with a social media company to get access to a fire hose data feed, but you can't force individuals posting to that social media channel to abide by your company’s data quality rules!
Recall the three concerns I noted in my earlier post: data management, data accessibility and data governance. In essence, extra-enterprise data management reframes how these concerns are to be addressed. The difference between traditional data management and extra-enterprise data management lies in the difference between managing the creation, accumulation and storage of data (the old world) and the consumption of data (the new world).
In the old world, data management focuses on the formats of data values, the structure of data models and the organization of structured data as a byproduct of internal processes. In the new world, we can't control any of these aspects of data from alternate sources. Instead, we have to focus on:
- Cataloging data sources.
- Profiling the data and discovering patterns to enable downstream consumption.
- Virtualizing models to enable access.
- Monitoring for changes so we can react rapidly and assure that downstream users' work will not be severely affected by those changes.
Accessibility changes from providing connectors of ODBC/JDBC bridges to existing databases. Instead, utility capabilities must be able to adapt to different representations, and end-user data analysis and reporting tools must be able to rapidly accommodate source data in variant and unstructured formats.
Governance becomes a challenge as well. But from the data consumer’s perspective, the objective is to apply any standardizations and necessary transformations when the data is accessed, and not necessarily when the data is stored. That provides greater agility in absorbing new data sets and blending data from different sources together to achieve analytical modeling goals.
These are just few preliminary thoughts. In my upcoming posts we will delve deeper into the conversation about greater agility as data horizons continue to broaden and expand.