In the past few postings we looked at some of the issues that emerge as a result of uncontrolled creation of data warehouses and data marts. I suggested that the goals of a data warehouse consolidation project should not only include the creation of a new data asset that accommodates the users of the “to-be-consolidated” warehouses and marts, but that those systems should be retired and replaced by the new asset.
The corollary conclusions included a list of key success criteria to ensure that the consolidation actually results in having the new system be an effective replacement for all of the migrated sources.
But as you probably know, I have an ulterior motive in mind – my series of posts is less about consolidation and more about the tools we can use to ensure that the success criteria are met. That being said, I believe that master data management (MDM) can act as a valuable toolkit for supporting data warehouse consolidation since it can help satisfy a number of the suggested success criteria:
- Elimination of duplication: MDM systems are engineered around precise algorithms for searching and matching sets of identifying attributes across collections of data instances. These algorithms are sophisticated enough to allow for variance in the identifying attribute values, thereby providing “fuzzy” matching, presuming well-defined scoring mechanisms and acceptability thresholds.
- Consistency in deduplication: By employing the master data management systems as the arbiter for determination of entity information duplication, one can enforce a standard, consistent set of rules to be applied in determination, no matter the sources. This establishes consistency when multiple data warehouses are subjected to the deduplication and subsequent merge/purge processes.
- Data validation and standardization: One can use the MDM system as a clearinghouse for the collection of data quality validation and standardization rules that had been applied to the data warehouses that are subjected to consolidation. Because entity data or master reference data is filtered through the MDM system to ensure that duplicate data is not inadvertently created by source data systems, it also can be the most appropriate location for applying validation and standardization rules from across the user community.
- Synchronization: After an initial consolidation is completed, the issue of managing synchrony with the data sources that populate the renovated data warehouse framework still remains. The MDM system can be configured to enable shared services to apply identity resolution to any newly created source records and identify any potential matches with existing entity identities that are managed within the master data index. Any updates to the master index and repository can either be periodically published to the new consolidated warehouse, or if there is a greater need for closer to real-time synchronization, updates may be trickle-fed directly from the master repository and profile to the data warehouse. A timeliness service level agreement can be established whose compliance can be monitored within the audit trail of the master data system.
- Semantic consistency: MDM systems that are used for identity resolution and application of survivorship rules must be configured very carefully to ensure that semantics are not violated, and this means that the same techniques can be applied for data warehouse consolidation.