The other day across the street at the gas station there was a bear in the dumpster. This bear was definitely dumpster diving in the hopes of finding a few nuggets of great stuff to eat prior to the "big sleep." Everyone was clapping to try to get the bear to leave, and eventually the bear left with a bag of what he hoped was great food.
The concept of dumpster diving for data would entail the data management (i.e. data quality team) diving into data and looking for data that could possibly have one of these issues:
- A child record that has no parent. Probably because referential integrity is turned off and foreign keys are not checked when a new record is inserted. This seems to be a common practice in some data warehousing environment. So, the matching (parent to child) is the burden of the ETL tool. Ask yourself this: is all the ETL you have in your libraries correct? If not, then TAKE A DIVE into your data and see how you stack up.
- Are all the not null attributes really not null? The other day I found some quantities that had been set to 0 in some fact tables. Now unless the front-end excludes then for averages these would be counted. If your data could possibly have this issue, then TAKE A DIVE into the data with your profiling tool and see if this is something you want to monitor AUTOMATICALLY in the future.
Dumpster diving your own data may not be an option. BE THE BEAR!