How to improve on "5 Whys" when analyzing data quality defects

The “5 Whys” technique is one of the most commonly cited analysis tools in popular data quality texts, and for good reason.

First, it's a simple tool to teach. It only takes a few minutes to explain to someone the basic workings of the 5 Whys approach. Start with a known data quality problem, ask why that problem exists and work deeper to the source of the problem based on the answers provided.

Second, the 5 Whys technique is easy to use. It can work in both technical and business settings. Over the phone, via email, in workshops -- it is highly practical.

The problem with 5 Whys stems from the fact that a particular line of questioning can easily lead you to believe that the issues stem from one fault. In reality, data quality problems are often extremely complex, spanning social, technical, organisational, environmental, architectural and even regulatory causes.

Also, people are notoriously cautious when they feel that they’re laying blame. Of course they can go the opposite path and openly criticize groups and individuals based on historical bias. Both situations are not exactly ideal.

How can you improve the 5 Whys technique?

The first step is to gather solid data and analysis before you start the 5 Whys process. Construct a detailed view of the information chain surrounding the observed defect and trace it back across the data lineage. Data quality tools are obviously perfectly suited for this, particularly those that allow relationship analysis and measurement.

Build up a rich picture of the data and structures surrounding your problem so that when you get started with the interview process you have a lot more information to verify and expand upon in the questioning process.

Next, you need to break the linearity of the 5 Whys process by asking multiple lines of questions. For example, if you’re observing duplicate customer names in a CRM system you should first understand whether the duplicates are common to all customer types. Is there any bias towards a particular type of record or source of creation?

For example, you may have duplicates when a call centre adds a new customer -- but they’ve already sent their details via application form. Asking 5 Whys of a technician might infer that the problem is due to the de-duplication software not performing correctly because that’s their area of responsibility. But another line of questioning would hopefully address why the call centre process didn’t flag the fact that the customer already has an account. In effect, you have to create separate branches -- one looking at why the de-dup software needs fine-tuning and the other looking at why the call centre process is failing.

So, in summary, to improve 5 Whys you'll benefit from gathering evidence and analyses using appropriate automated data quality tools in advance of your 5 Whys investigation. Then expand the traditional, linear approach to 5 Whys by branching off new lines of questioning so you don't narrow your causes. With this approach, you’ll find that your root-cause process should operate far more smoothly and effectively.

Blogs

Blogs

How to improve on "5 Whys" when analyzing data quality defects

About Author

Leave A Reply Cancel Reply