Linking identifiers for improved analytics


In my last post we started looking at the issue of identifier proliferation, in which different business applications assigned their own unique identifiers to data representing the same entities. Even master data management (MDM) applications are not immune to this issue, particularly because of the inherent semantics associated with the assignment of the identifier to the class of entity to be managed.

linking identifiersI recently came across an interesting example working with a client’s MDM system. Basically, the system was intended to unify all data about customers. One complexity was that there were two kinds of customers: residential and business. Each residential customer was assigned a unique “residential customer” identifier, and each business customer was assigned a unique “business customer” identifier. In some cases the business consisted of a single sole proprietor running his/her own small business.

Later in the development process, the developers determined that they needed to keep track of relationships between individuals and businesses, such as employment, ownership or support relationships. Each of these individuals was assigned a unique “person” identifier and associated with its related “business customer” identifier. The problem was that some of the individuals assigned a “person” identifier already existed in the “residential customer” domain, and some already existed as sole proprietors in the “business customer domain.” This led to the existence of two, and sometimes three, master (and potentially more) records representing the same individual! The result was a situation where affiliated records (such as contact information, location information, etc.) for these individuals could not be managed through a single identifier, leading to (yet again) duplicated and inconsistent data.

The seemingly obvious issue is the need for additional identity resolution to link the different identifiers together. But the real issue is that we need to consider the right master domain models before building the system for identity resolution and assigning unique identifiers.

Creating a “business customer” or “residential customer” identifier locks the meaning to the identifier instead of the entity. The upshot is that subsequent analyses will not be able to provide insights into correlations about individuals who play multiple roles in multiple contexts. For example, you might want to know when certain individuals have ownership roles associated with multiple (different) businesses, or whether a sole proprietor is also acting as a residential customer.

Master data modeling is critical to reducing the complexity of identifier proliferation as well as ensuring the quality of analytics. In any case where entities exist in different contexts and are assigned different roles, reconsider how the model can capture the core information about each unique entity – as well as how that entity plays different roles in those contexts. The right model will enforce the uniqueness of entity data management and make it simpler to manage consistency across the enterprise.

Read – Taking MDM to the next level


About Author

David Loshin

President, Knowledge Integrity, Inc.

David Loshin, president of Knowledge Integrity, Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. David is a prolific author regarding data management best practices, via the expert channel at and numerous books, white papers, and web seminars on a variety of data management best practices. His book, Business Intelligence: The Savvy Manager’s Guide (June 2003) has been hailed as a resource allowing readers to “gain an understanding of business intelligence, business management disciplines, data warehousing and how all of the pieces work together.” His book, Master Data Management, has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at . David is also the author of The Practitioner’s Guide to Data Quality Improvement. He can be reached at

Related Posts

Leave A Reply

Back to Top