Jim Harris recently penned an interesting article describing what happens to data quality at the top of the bell curve. The central theme of the article explains how, as we strive for greater levels of quality, we hit diminishing returns. For example, the cost of sending an engineer down a tunnel to check that every cable is correctly labelled can be far too high.
Finding the threshold for data quality improvement is an important decision for data quality leaders and their sponsors within the business. So how do companies make that decision?
In one of my earliest data quality interviews on Data Quality Pro with KFR Services, I discovered they linked the reduction of defects to staff bonuses:
“...with the [data quality]scheme we award a bonus if one of our staff reaches a data quality goal. We try and tie their goals to the overall company goal. So if our corporate goal was a 50% reduction in defects we would make the individuals goal to be cutting their own recorded defects in half. This motivates the team as no-one wants to see a month where defects occur, it definitely helps the entire team to keep the figures up.”
They adopted a process of cutting defects by 50% over regular intervals. As a result, they edged closer and closer to perfection. They made their data quality statistics public so everyone in the company lived and breathed data quality.
However, I see a challenge here. As your defects become so difficult to find, how do you incentivise the staff to keep going? If they’re rewarded for defect reduction could this perhaps lead to a gaming of the situation? It’s an interesting dilemma. In the above case it didn’t happen as it’s a small, close-knit business, but I can imagine in a mega-corporate this could pose a problem.
I think an important aspect of the "getting near zero" debate was summed up in my mantra of zero-defect data migration article last year. Just because you’re not dealing with the defects doesn’t mean you fail to measure and manage them.
What do I mean?
On a visit to a telecoms site I once discovered that something like 35% of equipment had data quality issues. A lot of equipment didn’t have a serial number, for example. In terms of data quality, this is a defect as it often helps to have every serial number recorded. However, it isn’t mission-critical. The business can still operate and the cost of going out on site to find all these serial numbers is prohibitive. You could say they reached the top of the bell curve and the law of diminishing returns meant it wasn’t going to be addressed.
However, that doesn’t mean it shouldn’t be measured and managed.
Management still need to know how many serial numbers are blank. They can still put remedial processes into their everyday engineering workflow that gets engineers to update those missing data when they’re onsite. You still need working instructions and policies to cope with the missing data. You can’t just leave it to chance.
Going back to the earlier data migration example, I often see project leaders taking a stance of "Ah well, bad data, fact of life, let’s shunt the data into the target and deal with the fallout later. C’est la vie, right?" No, this is not right. You still need to know what data will work for the migration and what data will cause a failure. Defective data still needs to be measured and coordinated because if we don’t do this we’re injecting variation into our service and this is where the costs creep in.
So, here’s my view. Yes, it is often far too costly to resolve all your data quality issues, but you should still know what they are, what they’re costing you and how you are going to cope with them if and when they impact the business.
What do you think? How does your organisation cope with the diminishing returns of data quality improvement? I welcome your views.