The challenge of synchronizing the master index

0

What really happens when a new entity is added to a master data environment? First of all, this is only done when it is determined that the entity is not already known to exist within the system. Abstractly, a significant amount of the process involves adding the new entity into the master index. At the very least this involves creating a new master repository record to hold (at a minimum) the identifying data element values and generating a unique identifier based on those data element values.

At the same time, though, this may incur additional processing. For example, as new data element values are added to the environment, the frequencies of data values occurring may change, which will alter the probabilities associated with matching a particular set of values to known entities, which in turn may modify the tuning parameters and weights associated with the identity resolution algorithms. And since these adjustments are related to overall data value frequency, it suggests that you should wait until there are multiple new entities queued up to be added to the environment before updating the master index and the identity resolution parameters.

On the other hand, waiting until a bunch of new entities have been collected means that the master index (and repository) will lag behind the operational systems that use the master data environment by the period over which the master index is updated. In that case, if an individual is added to an operational system’s data set and the data is queued for later insertion into the master index, that individual remains “hidden” from the enterprise until the next synchronization period. That suggests that perhaps every time a new entity record is created, the identifying information should be immediately registered in the master index!

The challenge, then, is to balance completeness and currency of the master index (“full synchronization”) requiring a significant amount of processing for rebalancing probabilities vs. a less complete master index (“periodic synchronization”) that may lead to seemingly missing entities as well the need to screen out duplicate entries added to the queue for insertion via different process paths. Determining the approach, of course, will depend on your applications’ requirements, which we will explore in my next post.

Share

About Author

David Loshin

President, Knowledge Integrity, Inc.

David Loshin, president of Knowledge Integrity, Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. David is a prolific author regarding data management best practices, via the expert channel at b-eye-network.com and numerous books, white papers, and web seminars on a variety of data management best practices. His book, Business Intelligence: The Savvy Manager’s Guide (June 2003) has been hailed as a resource allowing readers to “gain an understanding of business intelligence, business management disciplines, data warehousing and how all of the pieces work together.” His book, Master Data Management, has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at mdmbook.com . David is also the author of The Practitioner’s Guide to Data Quality Improvement. He can be reached at loshin@knowledge-integrity.com.

Leave A Reply

Back to Top