Entity resolution in isolation

0

The conclusion from my last post was that entity resolution can indeed exist as a product that can remain segregated from master data management (MDM). However, the benefit of integration with MDM is that its utilization is directly embedded within the MDM application, which reduces the level of expertise the users need to take advantage of the capability.

There is still a need for expertise in tuning the parameters of the matching routines, but in general, once those knobs are set, the MDM application will chug along as directed.

That being said, entity resolution is not truly an application per se, but is more like a utility that can be incorporated into various services to meet application requirements. We can boil the “architecture” of the utility into a few simple operational concepts:

1)    Enable a method for establishing what I would call “semantic alignment” between two data instance models that maps sets of data elements in one model to a semantically equivalent set of data elements in the second model.

2)    Given two data instances whose data elements can be semantically aligned, provide a “scoring” method that measures the “similarity” between the two data instances.

3)    Given a “search” data instance and a pool of “persistent” data instances, filter out the persistent data instances with the highest likelihood of matching the search data instance.

4)    Provide a means for taking a collection of search data instances and iteratively matching each against a pool of persistent data instances.

5)    Enable adjustments to matching algorithms and strategies for tuning the precision for scoring and matching.

With these core pieces of functionality, you should be able to formulate services to address most traditional non-MDM usage scenarios for entity resolution. But recognize that to consider the capability an application instead of a utility, additional functionality (such as a control panel or governance methods for data correction) will need to be layered on top of the core componentry.

Share

About Author

David Loshin

President, Knowledge Integrity, Inc.

David Loshin, president of Knowledge Integrity, Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. David is a prolific author regarding data management best practices, via the expert channel at b-eye-network.com and numerous books, white papers, and web seminars on a variety of data management best practices. His book, Business Intelligence: The Savvy Manager’s Guide (June 2003) has been hailed as a resource allowing readers to “gain an understanding of business intelligence, business management disciplines, data warehousing and how all of the pieces work together.” His book, Master Data Management, has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at mdmbook.com . David is also the author of The Practitioner’s Guide to Data Quality Improvement. He can be reached at loshin@knowledge-integrity.com.

Leave A Reply

Back to Top