“You don't talk about data quality.”
No, wait—that's The First Rule of Poor Quality Data.
The First Law of Data Quality:
“Data is either being used or waiting to be used—or wasting storage and support.”
Although understanding your data is essential to using it effectively and improving its quality, as Thomas Redman explains, “it is a waste of effort to improve the quality of data no one ever uses.”
Therefore, investigate your data usage by asking the following six questions:
1. Where did the data originate?
Data is like Tribbles. The trouble with tribbles is that before you know what happened, you have way too much to handle. Every enterprise system seems to have more data than thought humanly possible (or even Vulcans could think possible)—and your data volumes are continuing to grow at alarming rates.
Your Kobayashi Maru of Data Usage begins with establishing data lineage. Did the data originate from an internal or external source? How many copies of the data exist? Is there a single system of record or a preferred source system for the data?
2. Why was the data received?
With external data, it is often easier to both identify the source and understand its intended purpose. For example, reference files purchased to either enrich or validate master data attributes.
With internal data, this can be more challenging. Data warehouses and master data management hubs might be staging all operational and legacy data sources for their subject areas, even though some of this data isn't actually being used.
For example, the reasons why financial transaction data was received is perhaps more obvious than other types of data. However, it's always important to determine exactly why you are receiving data.
3. When is the data applicable?
Similar to radioactive elements, all data has a limited shelf life. All data decays, but not necessarily at the same rate.
There are many different dates associated with data. Knowing accurate creation, update, effective, expiration, and other available dates can help estimate the timeframe that data will be applicable for its intended usage.
Just because storage has become less expensive, doesn't mean your organization should keep data forever. Knowing its shelf life can be used to indicate when data should be archived or possibly even deleted.
4. Who is the data describing?
As Peter Benson of the ECCMA explains, “data is intrinsically simple and can be divided into data that identifies and describes things, master data, and data that describes events, transaction data.”
Who are the “things” and “events” your master and transaction data are describing?
5. What does the data mean in business terms?
Business meaning is not entirely limited to the company's bottom line. However, the costs, risks, and revenue associated (directly or indirectly) with your data are the minimum requirements for this assessment. Additional aspects could include a matrix of business units and business processes associated with the data.
6. How can the data be used to make business decisions?
It can be argued that this is the fundamental question behind all data usage.
Ultimately, the success of an organization is measured by the results of its actions, which were based on its decisions, which were based on the information derived from its data. Therefore, the true purpose of data is to serve as a solid foundation for sound business decisions.
How data is being used is more important than the business processes that create it and the technical processes that manage it.
This is especially true when evaluating the potential ROI of data quality improvements. Ensuring the data being used to make critical business decisions is reliable and accurate is why data quality is so important to your organization.
Do you understand the where, why, when, who, what, and how of your data usage?
Good stuff, Jim, even for my Star Trek-limited mind...
1. What's the preferred source system for the data?
Unfortunately, I have seen organizational tribes bicker over this, putting their own interests ahead of the common good.
2. How can the data be used to make business decisions?
While the exception rather than the rule, I have lamentably seen some people never ask this simple but important question. Call it not seeing the forest from the trees, but many folks get so caught up in data-based arguments (no pun intended) that the elephant in the room remains unaddressed.
“My data quality mind to your mind.
My data quality thoughts to your thoughts.”
Data Quality Mind-Meld complete . . .
Excellent points, Phil – thanks for your comment!
Another great post Jim,
No 5. What does the data mean in business terms? - is a key one for me. There is no better way to understand the value of your data, actually I will go as far to say there really is no other way to get an understanding of the value of your data.
Without that understanding you will struggle to form a persuasive business case for DQ initiatives.
And, imagine a mind-meld between Phil Simons and Jim Harris, to frightening to contemplate! "Beam me up Scotty!"
Thanks for your comment Charles,
I definitely agree with your refinement of No. 5 - the easiest way to get ignored when you are talking about the importance of data quality is to ignore the business context - especially since the business context is the ONLY context for data - otherwise you might as well as simply collect Tribbles.
P.S. Good point about the mind-meld - although perhaps then Phil and I could collectively grow enough hair to compete with you - Ka'Pla! (Roughly translated, in Klingon that means "Good luck with that!")
I really connect with this post Jim, great tips.
I'm a massive pareto fan when it comes to data quality and one of the first things I do is to show people the "hotspots" in their data landscape ie. which data is driving their business. This really helps to shape the data quality discussion because all of a sudden this huge mountain to climb can quickly become a much smaller challenge.
I think what you've got here is a simple methodology for questioning the usage of the data, forcing the accountability back onto the business - do you really need this? Can you justify the ongoing cost and hassle of maintaining this info?
So, plenty of ways to use what you've listed and run with it, great post.
Thanks for your comment Dylan,
I like the term “hotspots” – from now on, I will explain this process as the need to dig through your giant pile of Tribbles in order to find your Hotspots. :-)
Yes, all too often I have seen clients climbing huge data mountains just “because it was there” – without asking if anyone even uses this data – or uses it anymore, since the data probably was needed and used at some point.
So yes, let’s neither climb every data mountain nor make a mountain out of every data molehill.
Let’s instead investigate and identity the data most critical to daily business decisions and make sure that data is as reliable and accurate as possible.
>>Similar to radioactive elements, all data has a limited shelf life. All data decays, but not necessarily at the same rate.
Maybe true, but some have a very, very slow rate of decay. Example: The geologic data from an oil well drilled in the 1950's is just as pertinent today as it was then. Prospects and interpretation techniques have changed, but the geology remains the same.
Thanks for your comment Phillip,
You have provided an excellent example of useful data with a very long shelf life.
Additionally, you raise the excellent point about how information (i.e., the interpretation of data) usage also requires comprehensive investigation.
Pingback: The Sixth Law of Data Quality | The Data Roundtable