In the previous three blogs in this series, we talked about what metadata can be available from source systems, transformation and movement, and operational usage. For this final blog in the series, I want to discuss the analytical usage of metadata.
Let’s set up the scenario. Let's imagine I'm a new person hired to create some analysis on customers, products and revenue. Where would I look for the required data? Most likely, I'd look in:
- Data warehouses.
- Source systems.
- Any operational data stores.
- Master data management sources.
- The big data environment.
The data warehouse may have the data (see the metadata for this enterprise data source), but it may not be updated as frequently as our requirements demand. The source systems are probably out of scope, because we try as much as possible not to degrade our sources. However, we can gain insight into the data by reviewing this metadata.
The operational data store(s) – if there are any – may have integrated data, and more up-to-date data. Master data management sources may only have the current view of a customer and product, and our analysis may not require such quality. Or does it?
From looking at the sources of data to determining the latency for this requirement, I think we're ready to bring it together.
If all the data required for this analysis is in the data warehouse, and the latency is good – go there. If you need to pull data from a few sources, consider using the big data environment as the receptacle for that data. You can usually get the data in there quickly to prove the analysis.
There are more tools available every day that work for the big data environment. Just keep in mind that such an environment requires governance and attention to the quality and integrity of the data.