Managing the master index: batch vs. real-time

0

In my last post, I introduced a number of questions that might be raised during a master data integration project, and I suggested that the underlying subtext of synchronization lay at the core of each of those issues. It is worth considering an example of an application to illustrate those points in the environment that are sensitive to synchronizing data among the various sources and process stages.

Let’s consider the use of a master customer index that links customer data among three sales channels: brick and mortar, telesales, and electronic commerce. In principal, the information about any individual that makes a purchase through any of these channels is expected to be available to all of these channels (as well as other business functions such as finance or customer service). And for the purposes of our discussion, let’s assume that a customer’s name, telephone number and street address is sufficient for unique identification. In addition, let’s presume we have a fully functional master index populated and in production.

Using these assumptions, consider this use case: A new customer buys a product online. At the point of sale, the individual is prompted for the identifying information that is relevant to unique identification, which must be provided prior to allowing the sale to continue.

Now what happens? That identifying information is submitted to the identity resolution engine to perform a lookup in the master index to see if this individual is already known as a customer. If so, the customer’s information is accessed from the repository and is used to complete the transaction. If not, the individual is a new customer, and that individual’s identifying information has to be added to the environment.

There are two approaches for adding the new customer: as part of a batch of new identities added on a periodic (daily) basis, or in real-time. But either approach poses some challenges. For example, if the customer’s information is added to a periodic batch, that customer remains “invisible” to the master environment until the completion of the next batch sequence. On the other hand, adding customers in real time will have a performance impact on the identity resolution engine, especially if the load is high, since the index may need to be globally updated to reflect merged identities or recognized relationships that are exposed as new customer data is added. We will examine this in more detail in my next post.

Share

About Author

David Loshin

President, Knowledge Integrity, Inc.

David Loshin, president of Knowledge Integrity, Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. David is a prolific author regarding data management best practices, via the expert channel at b-eye-network.com and numerous books, white papers, and web seminars on a variety of data management best practices. His book, Business Intelligence: The Savvy Manager’s Guide (June 2003) has been hailed as a resource allowing readers to “gain an understanding of business intelligence, business management disciplines, data warehousing and how all of the pieces work together.” His book, Master Data Management, has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at mdmbook.com . David is also the author of The Practitioner’s Guide to Data Quality Improvement. He can be reached at loshin@knowledge-integrity.com.

Leave A Reply

Back to Top