The Fourth Law of Data Quality

The First Law of Data Quality explained the importance of understanding your Data Usage, which is essential to the proper preparation required before launching your data quality initiative.

The Second Law of Data Quality explained the need for maintaining your Data Quality Inertia, which means a successful data quality initiative requires a program—and not a one-time project.

The Third Law of Data Quality explained a fundamental root case of data defects is assuming data quality is someone else’s responsibility, which is why Data Quality is Everyone’s Responsibility.

The Data-Information Continuum

Whether it is an abstract description of real-world entities (i.e., “master data”) or an abstract description of real-world interactions (i.e., “transaction data”) among entities, all data is an abstract description of reality.

These abstract descriptions can never be perfected since there will always be what I call a digital distance between data and reality.

I also make a distinction between data and information, which I view as interrelated entities forming what I like to call The Data-Information Continuum.

Although a common definition for data quality is fitness for the purpose of use, the common challenge is that all data has multiple uses—and each specific use has its own specific fitness requirements.

Viewing each specific use as the information that is derived from data, I define information as data in use or data in action.

Therefore, information is customized to meet the subjective needs of a particular business unit and/or a particular tactical or strategic initiative. In other words, the information is customized data used as the basis for making a business decision.

This is why data quality has both objective and subjective dimensions.

Although data’s quality can be objectively measured separate from its many uses (i.e., data can be fit to serve as the basis for each and every purpose by attempting to maintain an accurate description of reality), information’s quality can only be subjectively measured according to its specific use.

The Fourth Law of Data Quality

Most organizations suffer from a lack of a shared business understanding, or what I like to call a Shared Version of the Truth.

Objective data quality standards provide a highest common denominator to be used by all business units throughout the enterprise as an objective data foundation for their operational, tactical, and strategic initiatives.

Subjective information quality standards (starting from the objective data foundation) are customized to meet the subjective needs of each business unit and initiative.

This approach leverages a consistent enterprise understanding of data while also providing the information necessary for day-to-day operations.

Therefore, The Fourth Law of Data Quality states that:

“When establishing data quality standards, you must include both objective data quality and subjective information quality.”

Remarkable Data Quality

As Seth Godin explained in Purple Cow: Transform Your Business by Being Remarkable, the opposite of “remarkable” is not “bad” or “mediocre” or “poorly done.”

The opposite of remarkable is “very good.”

In other words, don’t just establish data quality standards and set goals to meet them.

Your goal should be to exceed your goals.

Perfection is impossible—but remarkable data quality is not.

Be remarkable.

2 Comments

Julian Schwarzenbach on July 15, 2010 4:26 pm

Jim,

Another excellent post (as always).

I agree that data needs to be fit for purpose, but where multiple business purposes are involved, then agreeing a standard that is appropriate for all purposes will be challenging.

The flip side of this is that there is likely to be some data which has no business purpose (although it may have had one at some point in the past). This matches a key data quality attribute which sometimes gets forgotten about - relevance. When covering the topic of relevance in presentations I use a cobweb image - which of your data items have got cobwebs??

Julian
Jim Harris on July 19, 2010 9:35 am

Thanks for your comment, Julian.

I agree that data relevance is an often forgotten data quality attribute. In the The First Law of Data Quality, I quoted Tom Redman: “it is a waste of effort to improve the quality of data no one ever uses.”

As you noted, data which has no current use (i.e., business purpose) likely did have one at some point in the past. However, the default approach in data management has historically (pun intended) been to manage all of the data.

With data silos continuing to replicate data as well as the growing volumes of new data created daily, managing all of the data is not only becoming impractical, but because we are too busy with the activity of managing it, no one is stopping to evaluate usage or relevance.

Therefore, the data and the information that the enterprise truly needs for continued success may be stuck in the long line of data waiting to be managed--and most likely in line behind data no one even uses anymore.

Cheers,

Jim

Blogs

Blogs

The Fourth Law of Data Quality

The Data-Information Continuum

The Fourth Law of Data Quality

Remarkable Data Quality

About Author

2 Comments