Dynamic data and coalescing quality

0

In my last post, I pointed out that we data quality practitioners want to apply data quality assertions to data instances to validate data in process, but the dynamic nature of data must be contrasted with our assumptions about how quality measures are applied to static records. In practice, the data used in conjunction with a business process may not be “fully-formed” until the business process fully completes. This means that records may exist within the system that would be designated as invalid after the fact, but from a practical standpoint remain valid at different points in time until the process completes.

In other words, the quality characteristics of a data instance are temporal in nature and dependent on the context of the process. For example, we might say that a record representing a patient’s hospital admittance and stay is valid if it has an admission date and a discharge date, and that the discharge date must be later than the admission date. However, while the patient is still in the hospital, the record is valid even though it is missing a discharge date.

This means that if you want to apply data quality rules that are embedded within a process, the rules themselves must be attributed as to their “duration of validity.” More simply, if you want to integrate a data quality/validation rule within a process, the rule itself must be applicable at that point in time; if not, the test itself is not valid. The rules have to reflect the quality assertions that are valid at that point in the process, and not expectations about the data after the process is complete. (For more on data quality in a big data world, read my white paper: "Understanding Big Data Quality for Maximum Information Usability.")

The upside of this approach is that it forces the analyst to consider the lifecycle of the record, and how the expectations may change across that lifecycle. A record that is incomplete at one point in the process is valid, yet at a later point, the incompleteness is no longer permissible. From a practical standpoint, integrated data validation can then identify those process points where permissible gaps are no longer allowed, creating the opportunity for proactive steps to be taken to ensure the record’s quality. The quality of data is allowed to coalesce, or “gel” over time until the point in time at which all the quality assertions must be true.

Share

About Author

David Loshin

President, Knowledge Integrity, Inc.

David Loshin, president of Knowledge Integrity, Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. David is a prolific author regarding data management best practices, via the expert channel at b-eye-network.com and numerous books, white papers, and web seminars on a variety of data management best practices. His book, Business Intelligence: The Savvy Manager’s Guide (June 2003) has been hailed as a resource allowing readers to “gain an understanding of business intelligence, business management disciplines, data warehousing and how all of the pieces work together.” His book, Master Data Management, has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at mdmbook.com . David is also the author of The Practitioner’s Guide to Data Quality Improvement. He can be reached at loshin@knowledge-integrity.com.

Related Posts

Leave A Reply

Back to Top