"The Watchmater nodded. Even an empty data set provides important information." --Kevin J. Anderson, Clockwork Angels
Sometimes sage words on data management come from unexpected sources.
A few weeks ago, I was reading an advanced review copy of Clockwork Angels, the latest book from prolific sci-fi writer Kevin J. Anderson. Angels is completely fictional (actually, it's a novelization of the Rush album of the same name.) However, the line above reminded me of my days consulting on large-scale CRM and ERP projects.
When profiling, converting or trying to load data, I would frequently query the database ahead of time. I was looking for potential problems like suspect or duplicate records, invalid values or remnants of issues that should have been resolved by now. And often the results of my queries would return nothing – a.k.a., the null set.
Despite its moniker, it would be wildly inaccurate to say that the null set is worth zero. In fact, I'd argue the opposite. I can think of dozens of times (dozens!) in which running a query or standard report returned nothing – and it was the very lack of results that made the team and me more confident to proceed. Finding thousands of problematic records gave us a great deal of pause.
My favorite tricks with Microsoft Access involved using macros and a little VBA to spit out results to functional users in simple formats. For instance, in one large workbook, a payroll or finance manager would receive multiple queries (as tabs). A quick look those null results meant that all systems were go.
Simon Says
Don't get me wrong. Null sets don't always mean good news. Incorrectly written queries will fail to find legitimate issues. More to the point, you can't write queries to find every type of error in an enterprise system. (Business rules and logic and data validation provide much of that functionality.) Still, rest assured that there's plenty of value in zero.
Feedback
What say you?
1 Comment
Hi Phil
Thanks. This is very true.
Your point about queries not finding errors is important, because we only find the things we design rules to find. We will find things we expect to find and things we don't, but we will not find things we don't write rules to find (the things we don't know we don't know).
I guess it come down to finding the things that will have a material impact (i.e. Pareto rule) and return value to the organisation in the form of successful migrations, upgrades, quality improvements, etc.