Static Models and Dynamic Data

0

After working in the data quality industry for a number of years, I have realized that most practitioners tend to have a rather rigid perception of the assertions about the quality of data. Either a data set conforms to the set of data quality criteria and is deemed to be acceptable – or the data set fails to observe the levels of acceptability and is deemed to be flawed.

I suspect that our attempt to designate a data set to be of “acceptable quality” (in relation to a discrete assessment) is an artifact of data warehousing, in which a data set is extracted, transformed and loaded as a single, static unit. Quality characteristics are measured en masse to provide an overall score for a static collection of records that are representative of the underlying data model.

Our data quality rules are typically defined in relation to the underlying data model, with the assumption that all of a modeled entity’s attributes will have already been completed. In the data warehouse scenario, this is definitely true, since the data set is extracted after the transaction processing has already been done. In retrospect, it seems that a similar perception exists at the time the entity is modeled in the first place – the completeness of the entity record is a necessary assumption; it wouldn't make sense to design an entity model with data attributes to which values are not assigned!

Learn more about data quality in a big data world in David Loshin's white paper "Understanding Big Data Quality for Maximum Information Usability"

However, as data quality practitioners seek to extend the processes of applying rules for data validation to transaction and operational processing, it occurs to me that there may be some flaws in our thinking that might lead us to draw mistaken conclusions about the quality of data. The gap in the reasoning is that our data quality rules presume the static representation, but the data supporting transaction and operational processing is dynamic.

Here is an example: we say that no sales transactions can be processed unless the customer has a valid customer account. While this may be a valid assertion once the records of the day’s transactions are to be loaded into the data warehouse, it might not be 100% true as the actual sales process is executing.

Consider this scenario: a new customer attempts to buy a product. At that point the customer does not have an account. A temporary account is created and the customer is assigned a provisional customer identifier so that the transaction can be recorded. At the same time, credit information is collected from the prospective customer and submitted for review by the credit and finance department. Once the prospective customer’s credit information has been vetted, the customer’s account is upgraded from provisional to valid.

At a later point, the recorded sales transaction is processed, and the customer’s identifier is searched to determine whether the customer is still in provisional status or has been upgraded. If the customer is still provisional, the transaction is put on hold until the next time transactions are processed. If the customer has been upgraded, the transaction is processed and the order is sent to fulfillment.

What this means is that that at some point in time there will be sales transactions for customers that do not have a valid customer account, and that is not an error. Rather, the records are still changing, and the assertions of quality cannot be applied until the workflow process has completed and the data has been resolved from its dynamic state to a static state.

In other words, the models may be static, but the data is dynamic. But if the data is dynamic, how can data quality rules be applied to the data in process? More next time…

Share

About Author

David Loshin

President, Knowledge Integrity, Inc.

David Loshin, president of Knowledge Integrity, Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. David is a prolific author regarding data management best practices, via the expert channel at b-eye-network.com and numerous books, white papers, and web seminars on a variety of data management best practices. His book, Business Intelligence: The Savvy Manager’s Guide (June 2003) has been hailed as a resource allowing readers to “gain an understanding of business intelligence, business management disciplines, data warehousing and how all of the pieces work together.” His book, Master Data Management, has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at mdmbook.com . David is also the author of The Practitioner’s Guide to Data Quality Improvement. He can be reached at loshin@knowledge-integrity.com.

Related Posts

Leave A Reply

Back to Top