Reference data harmonization

0

We have looked at two reference data sets whose code values are distinct yet equivalently map to the same conceptual domain. We have also looked at two reference data sets whose values sets largely overlap, though not equivalently. Lastly, we began the discussion about the guidelines for determining when reference data sets can be harmonized. In this last post of this month’s series, let’s look at some practical steps for harmonization.

In this case, we have two reference data sets and we have decided to harmonize the two data sets into a single remaining reference data set. Here are some basic steps to take:

  1. Choose a target data set: Determine the level of authority of the two reference domains. Identify the authoritative sources that provided the enumerations. If one source is “more authoritative” (your mileage may vary) than the other, choose that reference data set. Otherwise, use corporate guidance to select one of the data sets as the survivor, thereby designating the other data set as retired.
  2. Develop transformation: Create the transformation mapping from the values n the retired data set to the survivor data set.
  3. Retired data values: For each value in the retired data set that is not in the survivor set, determine how to transform the value into one that is in the survivor set.
  4. Transform data values: For every instance of use of the retired data set, apply the transformation.

If you cannot find an adequate and correct replacement for the retired values, it suggests that perhaps the data sets could not be harmonized after all. Note that the last step hides some additional complexity associated with reference data lineage, and I will address that in a future blog series.

Share

About Author

David Loshin

President, Knowledge Integrity, Inc.

David Loshin, president of Knowledge Integrity, Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. David is a prolific author regarding data management best practices, via the expert channel at b-eye-network.com and numerous books, white papers, and web seminars on a variety of data management best practices. His book, Business Intelligence: The Savvy Manager’s Guide (June 2003) has been hailed as a resource allowing readers to “gain an understanding of business intelligence, business management disciplines, data warehousing and how all of the pieces work together.” His book, Master Data Management, has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at mdmbook.com . David is also the author of The Practitioner’s Guide to Data Quality Improvement. He can be reached at loshin@knowledge-integrity.com.

Leave A Reply

Back to Top