Many years ago I worked in the utilities industry, which has to rank as one of the most data-intensive sectors.
The challenge with any utilities data is that so much of it is connectivity-related. If one side of the connection is missing, then entire pathways can be lost and services impacted.
Within a utilities site are millions of connection points. There are physical connections when connecting a cable from one device to another, and there are logical connections - for example, where multiple services run over the same circuit.
When I joined the utilities firm I was told several times of a story that is etched in data quality folklore. The tale centered around a site that several years earlier had a severe fire. When they came to re-instate the equipment and connectivity services by working from the operational data, they found that up to 40% of equipment just wasn’t where it was meant to be.
That figure always felt a little inflated to me until one day during a data migration impact assessment exercise I took my team onsite to complete the last phase of the assessment: an accuracy or reality check.
I was astounded by how out of touch with reality the data had become. Expensive hardware that was listed on the floor plans and connectivity data simply wasn’t in the right place. In some cases it wasn’t even on the same floor and couldn’t be found at all.
What was striking was how a lot out of our assumptions were totally invalid.
For example, we assumed that the older equipment records would be far more inaccurate when compared to the new equipment data. Not so. We found that a large amount of expensive, modern equipment just wasn’t where it was meant to be - but a lot of the ancient equipment still operated correctly in its recorded location.
The creation of stranded assets from inaccurate data became all too apparent when we discovered connections that were marked unavailable in the data but clearly had operational circuits running over them. We found equipment not recorded on any asset manifest and a broad range of associated data quality issues that meant this provider simply wasn’t getting the real value out of their data and equipment.
An accuracy check discovered that around 36% of the equipment just wasn’t recorded correctly, a figure reminiscent of my earlier 40% legend.
So what’s the moral here?
Well, of course inaccurate data costs the organisation. But it also costs the consumer. Companies still need to turn a profit, so faced with higher service costs and increased capital expenditure that revenue can only come from one place: customers.
We, the utilities consumer, are paying for those badly labelled power supplies and phantom connections.
But we also pay twice when serious events impact the services in these locations. Fires are not uncommon in utilities sites, and one wonders how much quicker services could be restored if data quality was taken far more seriously.
When you’re compiling your data quality assessments, don’t forget to look at the reality of your data. Inaccuracy is the most critical data defect - yet very few assessments pay it more than lip service. Yes, checking from a surrogate source is beneficial, but often there is nothing quite so illuminating as a reality check.
What’s more, if you really want to get sponsors bought into data quality, just watch their jaws drop when you show them the results.