What is reference data harmonization?


A few weeks back I noted that one of the objectives on an inventory process for reference data was data harmonization, which meant determining when two reference sets refer to the same conceptual domain and harmonizing the contents into a conformed standard domain. Conceptually it sounds relatively straightforward, but as with most data management techniques, its apparent simplicity hides a significant amount of complexity.

First, let's reconsider what reference data “harmonization” really means: taking two distinct data sets (data set A and data set B) that overlap to some measurable extent and merging those two data sets into a single data set. That merging process can be performed in a number of ways, including:

  1. Taking all the values of data set A and eliminating the non-intersecting values from data set B.
  2. Taking all the values of data set B and elimination the non-intersection values from data set A.
  3. Take all of the values from both data set A and data set B.
  4. Take some (or all) of the values from data set A and some (or all) of the values from data set B.
  5. Don’t merge the sets at all.

Of course, four of these five alternatives are all just the mechanical last steps of a more complex process of deciding which values ultimately belong to a unified set.

There are two challenges that must be overcome before we even get to this point. The first, determining that the value domains refer to the same conceptual domain, involves mapping the values in the reference data set to a set of valid value meanings. The second challenge involves identifying the authoritative definitions that guide the choice of valid values. We will examine both of these tasks in the next set of posts.


About Author

David Loshin

President, Knowledge Integrity, Inc.

David Loshin, president of Knowledge Integrity, Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. David is a prolific author regarding data management best practices, via the expert channel at b-eye-network.com and numerous books, white papers, and web seminars on a variety of data management best practices. His book, Business Intelligence: The Savvy Manager’s Guide (June 2003) has been hailed as a resource allowing readers to “gain an understanding of business intelligence, business management disciplines, data warehousing and how all of the pieces work together.” His book, Master Data Management, has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at mdmbook.com . David is also the author of The Practitioner’s Guide to Data Quality Improvement. He can be reached at loshin@knowledge-integrity.com.

Leave A Reply

Back to Top