Where to start with overloading


When we design systems, there is always a desire to build something that will support the business for some considerable time. A lot of banks, for example, still run on mainframe systems that have banking software that goes back literally decades.

For most other businesses, however, there is a limited shelf-life for their applications. In fast-moving industries that are subject to major regulatory and business model upheavals (e.g. telecoms), I’ve witnessed applications that have been scheduled for termination months after their commission.

The problem is that business models do change, consumer trends are a reality and the systems we design need to be built with this increasing demand for change in mind. Sadly most are not, which leads us neatly into the topic of overloading - one of the principle causes of bad data quality.

Overloading occurs when we try to use an attribute to store a value that it was never intended to hold. For example:

  • In a utilities organisation I’ve seen status symbols added as a prefix to equipment identifiers to indicate whether the asset is in stores, repairs or active service.
  • In customer data I’ve seen the title field (e.g. Mr, Mrs, Dr, etc.) appended with a suffix to indicate whether the customer is deceased.
  • Another type of common overloading is where the description field or notes field becomes a dumping ground for historic data (such as manufacturer codes and product identifiers) that are now defunct but still need to be recorded against the equipment.

It’s easy to parse out this information and include it in your data quality assessment or improvement exercise, but before you do it’s important to understand why this situation occurs since cleansing the data is merely applying a sticky plaster on what is a glaring failure in your information management landscape.

Overloading occurs for a variety of reasons:

  • Poor training of staff leads to misplacement of data values in the wrong fields.
  • Custom off-the-shelf systems prevent architectural design changes so users are forced to create hacks or workarounds.
  • Original design teams or application support has been dropped, meaning the users are left with a static system that can’t cope with new business demands.
  • Local rules require subtle differences in codes and identifiers that aren’t supported by the legacy design.

There are many more reasons, but the underlying cause is the system is now doing things it was never designed for. This has ramifications for your data quality work. If you’re building data quality rules and assessments based on how you think the system should behave, you may be inadvertently marking down data that is actually correct.

For example, in one engineering firm we found that one regional team had to store an additional product identifier for a specific type of equipment in a location field because the original system couldn’t cope with the longer identifiers found in one type of plant equipment. This appeared to be an anomaly because the profiling stats showed the data as an anomaly - but, in fact, it was 100% accurate. It was just overloaded.

In summary, you need to understand the root cause for overloading and put in a plan to either re-engineer the system, decommission completely or factor overloading into your data quality capabilities. It’s not enough to simply count an overloaded field as defective - that will just frustrate the business that's doing the best it can. Work with all parties to find a solution, but keep a pragmatic frame of mind until a long-term solution can be sought.


About Author

Dylan Jones

Founder, Data Quality Pro and Data Migration Pro

Dylan Jones is the founder of Data Quality Pro and Data Migration Pro, popular online communities that provide a range of practical resources and support to their respective professions. Dylan has an extensive information management background and is a prolific publisher of expert articles and tutorials on all manner of data related initiatives.

1 Comment

Leave A Reply

Back to Top