20 encounters of the information management kind – #5 The data warehouse has more records than the master data

0

Once in a while we run into issues where a business customer assumes the number of records in the data warehouse - for customer or product - should match exactly the records in the master data management data store or the original source system. There are many reasons why the numbers don’t match. Here are a few of my thoughts:

1. Time is part of the data warehouse if, in fact, we are storing product data with time as part of the primary key. There could be multiple records for each product in the data warehouse. For example, Product AYZ (or the widget) has characteristics (dimensions) that change based on seasons. For winter, Product AYZ only comes in the color of black; summer – white, spring – blue and fall – brown. The objective in our data warehouse is to track sales by season for Product AYZ. The product dimension would have time based by season (season 1, 2, 3, 4) associated with the product or a different unique product number for each of the four products, thus requiring us to query based on the product name to see the results we would like to analyze.

2. Another issue that may make the counts different would be using "not applicable" or "unknown" for product sales information that is not described properly or does not sync up with a specific product. So the product dimension may have a row with a product name of "not applicable" or "unknown." This would result in at least one more record than master data.

3. Also in the data warehouse we may keep all records for the last seven years, each with unique statuses like "delete" or "archived." Whereas the source system or master data management system may do "physical" deletes.

Let me know of any other instances where the data warehouse would have more records than master data.

Share

About Author

Joyce Norris-Montanari

President of DBTech Solutions, Inc

Joyce Norris-Montanari, CBIP-CDMP, is president of DBTech Solutions, Inc. Joyce advises clients on all aspects of architectural integration, business intelligence and data management. Joyce advises clients about technology, including tools like ETL, profiling, database, quality and metadata. Joyce speaks frequently at data warehouse conferences and is a contributor to several trade publications. She co-authored Data Warehousing and E-Business (Wiley & Sons) with William H. Inmon and others. Joyce has managed and implemented data integrations, data warehouses and operational data stores in industries like education, pharmaceutical, restaurants, telecommunications, government, health care, financial, oil and gas, insurance, research and development and retail. She can be reached at jmontanari@earthlink.net.

Leave A Reply

Back to Top