At this point in time, many medium to large organizations have a history of developing systems for reporting and analysis. Simultaneously, these organizations have evolved a set of basic processes that “pump out” a series of data marts used by specific business functions for generating reports. But in places with a proliferation of data warehouses and data marts used for reporting and analysis – with disparate systems into which the same data sources are copied – some typical issues seem to always crop up, such as:
- Seemingly identical data with latent, subtle differences
- Replicated data sets leading to increased demands for storage
- Synchronization issues that are the root cause of data inconsistency
- Variant sets of rules applied at different times in different places for validation, standardization, and correction
This profusion of replicated data sets leads to increased complexity for the users who want to use these data marts, especially if they need to know what data marts hold which data sets – and particularly if there is a need to have the downstream applications be aware of the multiple data silos from which to select their data.
A separate issue occurs when there are slight differences that are specific to the data models used in different data warehouses and data marts. The absence of a feature or capability in one existing data warehouse may have been the original motivation for creating a separate system in the first place that can support a set of business needs.
The objective of a data warehouse consolidation project is to address the business process gaps and failures that result from these issues. The approach involves examining sets of data warehouses to identify overlaps and differences, design models that can accommodate the union of those data sets, and then ensure that the new consolidated warehouse can be populated in a way that maintains consistency for all the users of the data marts and warehouse that are to be retired.