The impact of data quality reach

1

One of the common traps I see data quality analysts falling into is measuring data quality in a uniform way across the entire data landscape.

For example, you may have a transactional dataset that has hundreds of records with missing values or badly entered formats. In contrast, you may have a master dataset, such as equipment master or location master, with only a handful of poor quality records.

When using standard data profiling metrics, it is easy to assume that the transactional dataset performs the worst as it contains the highest volume of errors. In fact, the master dataset could have a far greater negative impact on the organisation because of its "data quality reach."

In the transactional dataset, any mistakes will be typically localised to an individual order or transaction. For example, imagine that I want to have my car serviced. I call the service centre to arrange a time for collection and drop-off. The operator takes my mobile phone number for any servicing queries and to arrange a time for delivery after the service. A fault is found on the vehicle, so the mechanic tries to call my mobile to discuss the options and request approval to proceed. Unfortunately, the call operator had missed off the final digit of my mobile telephone number so the call can’t be made. The mechanic is now forced to delay the service process as he trawls through the administration system to find the last known contact number.

Eventually, I’m called on the correct number but an unnecessary delay is added to the service. Because the service is fixed price, it’s the garage that bears this cost. The impact of this issue is  localised, only the one transaction was affected.

In the master dataset, example, a financial services firm obtains a master dataset of financial information for hundreds of companies. One record is found to be defective, it has an incorrect ticker symbol. The company data is merged into the existing ‘golden record’ set of data, but one company record doesn’t receive the updated information because the ticker symbols don’t match. The company is listed on the FTSE 100, and the updated data was vital to various financial calculations around the organisation. From one seemingly innocuous error, significant issues are felt around the firm.

This impact of "data quality reach" is why it is so important to weight your data quality metrics instead of just relying on basic profiling metrics. Taking this approach forces you to explore data quality holistically. It requires you to start developing a clearer view of data lineage and the downstream usage of data so you can fully understand the context and criticality of your data.

Share

About Author

Dylan Jones

Founder, Data Quality Pro and Data Migration Pro

Dylan Jones is the founder of Data Quality Pro and Data Migration Pro, popular online communities that provide a range of practical resources and support to their respective professions. Dylan has an extensive information management background and is a prolific publisher of expert articles and tutorials on all manner of data related initiatives.

Related Posts

1 Comment

  1. Is it ever safe to assume that reference data is correct and complete? While inner joins are clearly more efficient, wouldnt left joins or outer joins be more appropriate from a data quality analysis perspective?

Leave A Reply

Back to Top