In previous posts, I explained how measuring is intrinsically fuzzy and what is being measured is intrinsically fuzzy. In this post, I want to take on the common adage: “you can’t manage what you can’t measure.” As Charles Wheelan cautioned in his book Naked Statistics: Stripping the Dread from the Data, “you had better be darn sure that what you are measuring is really what you are trying to manage.”
Whelan cited research on the challenges of managing school or teacher quality by measuring student scores on standardized tests. “Any evaluation of teachers or schools that is based solely on test scores will present a dangerously inaccurate picture,” Whelan noted. The reasons include the vastly different backgrounds of the students as well as the varying skill levels of their teachers.
One humorous example of selection bias was the recent ranking of the best high schools in the Midwest. This ranking was based solely on student scores on standardized tests. The top-ranked high schools were selective enrollment schools, meaning that students must apply to get in, and only a small proportion of those students are accepted. One of the most important admissions criteria? Standardized test scores. So the schools being recognized as having students with excellent test scores are the schools that only admit students who have excellent test scores.
“This is the logical equivalent,” Whelan quipped, “of giving an award to the basketball team for doing such an excellent job of producing tall students.”
As former United States Secretary of Defense Robert McNamara said, “we have to find a way of making the important measurable, instead of making the measurable important.”
When it comes to the challenges of managing data quality, make sure that your data quality metrics are making the important measurable (i.e., the business impact of poor data quality), instead of making the measurable important (i.e., the metrics provided by data profiling tools).