In the past few posts we examined opportunities for designing a service layer for master data. The last note looked at the interfacing requirements between the master layer and the applications lying above it in the technology stack.
Exposing accessibility to master data through the services approach opens new vistas when it comes to master data integration, particularly in terms of load on the system. Again, this sheds some light on the gaps in the “consolidation approach” to MDM when the repository is engineered as a target system, not as both a target and a supplier of data.
In those implementations, the master repository may not be adequately resourced to handle an increasing load of consumer applications retrieving or updating the data. Yet a slow response to application requests for data will prove to be the system’s undoing unless other alternatives are considered for syndicating master data in a way that provides a current, synchronized view of master data while provide rapid enough access to satisfy the business application expectations.
I have been thinking about this for some time, and two ideas pop into mind. The first is replication: making copies of the master repository, publishing those copies to each of the consuming applications and regularly refreshing the repositories. While at first blush this may sound appealing, there are some obvious reasons to question the soundness of this approach. First, extracted copies will be difficult to control – the consuming applications may recast the data into different formats and store that copy locally (internal tables, or copying the data into application tables), creating a risk of asynchrony and inconsistency. It also poses the problem as to the best way to forward updates to the master records as well as how those updates are fed back to the copies and the time frames for doing so.
The second idea is an attempt to address the problems of the first idea by leaving the master data in one place, its original location, by using a data virtualization layer that federates access to the different master data repositories while exposing a unified canonical model to the consumers. This approach eliminates the problems of asynchrony and inconsistency because all the applications are basically looking at the single instance of the master data. And modifications and updates can be transacted through the virtual layer as well, especially if the transactions are serialized “under the hood.”
Both of these alternatives reflect a different foundational approach to MDM – the replication model adapts the centralized hub approach, while the virtualization model mirrors the “transaction hub” approach. There are benefits and drawbacks to both, but in the next posting we will look at a hybrid model that to some extent will satisfy the performance challenge as well as the consistency challenge.