A short brainstorm for crowdsourcing data quality

0

If you’re reading this, there’s a strong chance your organisation is on the road to data quality management maturity. One of the challenges you’ll obviously face is how to deal with all the defects discovered.

Many data quality problems can be "cleansed" instantly using appropriate technology, but for a lot of issues we just don’t have enough context to indicate what the correct value should be.

For example, if your data relates to the power industry, then you can infer a lot of data to indicate what the power rating, capacity and manufacturer of a power plant is – but there is a great deal of data that is specific to the installation and operation of that particular kit.

The same applies to customer data. We may have conflicting address or contact details for a customer, but it’s not always easy to specify what the accurate version should be.

In data quality there are often shades of grey. How do we manage data quality improvements in this situation?

Conventionally, the approach is often to perform major cleanses or campaigns to get the data up to an acceptable level. This obviously comes at a cost and can be a big hit on the business if they’re doing reality checks on data all day long.

There is another more organic method that your organisation could employ. It’s not right for every situation, but it could work well for data that has a low volatility and requires human judgement to correct.

If you type "data profiling" into Wikipedia, you’ll currently see the following text at the top:

This article has multiple issues. Please help improve it or discuss these issues on the talk page

The topic of this article may not meet Wikipedia's general notability guideline. (August 2010)

This article needs additional citations for verification. (August 2010)

These three lines give you a simple blueprint for how you could organically improve historical data defects in your organisation.

Imagine that a business user brings up a customer management application screen. They are faced with a message highlighting a problem with the enclosed information. The business user now has context around how they can use that data. Perhaps they brought up the screen because a customer is on a call with them and they can instantly populate those missing details the system has flagged up.

Consider a management report that has an addendum highlighting all the data quality defects that are currently impacting the report. They now have context over whether the report is trustworthy or not.

What I particularly like about the Wikipedia example is the fact it provides simple links to improve directly on the page or discuss the issues with others in a forum. This could easily be implemented in a traditional application.

I’m sure some of you will be thinking, How does this impact our existing investment in data quality technology or processes?

The answer is that it doesn’t replace anything; it simply extends your data quality offering. For example, your data quality tools can still perform the benchmarking and flagging of those records that need improvement. However, by adapting your application screens slightly to include healthcheck information, you can go one step further and directly involve business users to keep data accurate and clean.

This organic, crowdsourced-style approach means you will naturally focus improvements on your regularly accessed data, which can only be a good thing for your business.

Share

About Author

Dylan Jones

Founder, Data Quality Pro and Data Migration Pro

Dylan Jones is the founder of Data Quality Pro and Data Migration Pro, popular online communities that provide a range of practical resources and support to their respective professions. Dylan has an extensive information management background and is a prolific publisher of expert articles and tutorials on all manner of data related initiatives.

Leave A Reply

Back to Top