For decades, data quality experts have been telling us poor quality is bad for our data, bad for our decisions, bad for our business and just plain all around bad, bad, bad – did I already mention it’s bad?
So why does poor data quality continue to exist and persist? Have the experts been all talk, but with no plan for taking action? Have the technology vendors not been evolving their data quality tools to become more powerful, easier to use, and more aligned with the business processes that create data and the technical architectures that manage data? Have the business schools been unleashing morons into the workforce who can’t design a business process correctly? Have employees been intentionally corrupting data in an attempt to undermine their employers’ success? Wouldn’t any perfectly rational organization never suffer from poor data quality?
One of my favorite nonfiction books is Predictably Irrational by Dan Ariely, which provides a good introduction to behavioral economics, a relatively new field combining aspects of both psychology and economics. The basic assumption underlying standard economics is that we will always make rational decisions in our best interest, often justified by a simple cost-benefit analysis. Behavioral economics more realistically acknowledges that we are not always rational – and, most important, our irrationality is neither random nor senseless, but quite predictable when the complex psychology of human behavior is considered.
The basic assumption underlying most theories of data quality is that since the business benefits of high-quality data are obvious when compared to the detrimental effects of poor quality, then any people, processes or technology that allow poor data quality must either be acting irrationally or otherwise be somehow defective.
Therefore, preventative measures, once put into place, will correct the problem and alleviate any need for future corrective action, such as data cleansing. Everything, and everyone, will then be rational and wonderful in a world of perfect data quality.
However, people are far from perfect and they are often one of the root causes of data quality problems, such as when people assume data quality is someone else’s responsibility. David Loshin has recently been blogging about behavior engineering and behavior modification. I like using the term behavioral data quality to describe the necessary inclusion of aspects of psychology within the data quality profession.
Ariely’s book explains the dangers of not testing our intuitions, thinking we can always predict our behavior, and assuming our behavior will always be rational. Better understanding these flawed perspectives can help us truly better understand the root causes of our predictably poor data quality. Most important, it can help us develop far more effective tactics and strategies for implementing successful and sustainable data quality improvements.
1 Comment
Over the past 10 years books on human behavior have seen a surge in popularity. The writers satisfy our need to be “voyeurs” using pseudo analysis and research on the way humans behave.
However entertaining these books are these works of fiction shouldn’t be taken literally. Using these books as a basis to change behaviors or worse yet “engineer” behavioral change returns us to the era of Taylor.
Time and motion was the driving force during that era. Now that we are inundated and enamored with data we believe we can “engineer” human behavior. We believe the data exposes human behavior that can be manipulated.
The vicious circle is obvious. Unpredictable humans create data. Unpredictable humans exploit this data to effect change in the unpredictable human behavior. It’s like claiming we are changing the weather using weather predictions.
If data quality was important and relevant to humans they would not require behavioral engineering. This is where technologists step outside their competencies and try to become psychologists using popular books as a back drop. Technologists have not yet developed their own quality practices and principles when designing data. They have inflicted bad data practices on the environment and now profess that data quality is needed.
The data is polluted because of inadaquate data engineering disciplines (quality by design). Perhaps the first step should be to engineer the behavioral change of the engineers! The basic root cause of most data quality problems is that data is badly designed.