Master indexing and the unified view

1

In my last post I discussed aspects of data virtualization and at the end suggested that while the structural differences can be smoothed out via a typical federation/virtualization scheme, the mechanism can be enhanced to incorporate semantic consistency within the federation framework. A master data repository is often perceived to be a repository in which various data sources of shared master data are consolidated to provide a “golden record” or a “single source of truth.”

In simpler environments that don’t need to observe stricter data consistency and synchronization constraints, this approach may be sufficient. But in more complex systems, the creation of a master data repository is supposed to address three specific objectives that, when applied in sequence, increase the value of the source data asset:

  1. Identity resolution – The master data environment catalogs the set of representations that each unique entity exhibits in the original source systems. Applying probabilistic aggregation and/or deterministic rules allows the system to determine that the data in two or more records refers to the same entity, even if the original contexts are different.
  2. Data quality improvement – Linking records that share data about the same real-world entity enable the application of business rules to improve the quality characteristics of one or more of the linked records. This doesn't specifically mean that a single “golden copy” record must be created to replace all instances of the entity’s data. Instead, depending on the scenario and quality requirements, the accessibility of the different sources and the ability to apply those business rules at the data user’s discretion will provide a consolidated view that best meets the data user’s requirements at the time the data is requested.
  3. Inverted mapping – Because the scope of data linkage performed by the master index spans the breadth of both the original sources and the collection of data consumers, it holds a unique position to act as a map for a standardized canonical representation of a specific entity to the original source records that have been linked via the identity resolution processes.

In essence this allows you to use a master data index to support federated access to original source data while supporting the application of data quality rules upon delivery of the data.

Tags
Share

About Author

David Loshin

President, Knowledge Integrity, Inc.

David Loshin, president of Knowledge Integrity, Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. David is a prolific author regarding data management best practices, via the expert channel at b-eye-network.com and numerous books, white papers, and web seminars on a variety of data management best practices. His book, Business Intelligence: The Savvy Manager’s Guide (June 2003) has been hailed as a resource allowing readers to “gain an understanding of business intelligence, business management disciplines, data warehousing and how all of the pieces work together.” His book, Master Data Management, has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at mdmbook.com . David is also the author of The Practitioner’s Guide to Data Quality Improvement. He can be reached at loshin@knowledge-integrity.com.

1 Comment

  1. Dave Chamberlain on

    This is a very succinct description of what to me would seem to make common sense. The ability to both improve/augment the quality of the data and identify duplicates before "mastering" is key...

Leave A Reply

Back to Top