Are you making data quality a design task?

0

Ask any battle-hardened data quality practitioner and they will tell you that one of the leading causes of data quality defects stems from an inability to design quality into information systems. I am going to take a specific example of bad system design to explain how data defects quickly become a reality.

Earlier in my career I was asked to migrate data into a system that was built with abstract data model design.

Most businesses have customers, employees, suppliers, partners and many other types of party data that interact with their business. All of this information needs to be modelled in a system somewhere. To get around the problem of having multiple entities (or tables) to store this data, a lot of companies opt for an abstract modelling approach. This design means that the identifying information of multiple entities is stored in one table and each fact, or attribute, is referenced via foreign keys with other fact-based tables.

This concept of modelling provides incredible flexibility when designing new systems. It allows you to add new entities relatively easily because you don’t typically need to create new tables.

When implemented correctly, there are data quality benefits to this kind of approach. For example, if all of your address information is located in one table, it is easy to standardise your data validation rules in one area. Abstract models also make it easier to master your core subject data centrally, which has obvious data quality benefits.

So what is the problem?

The first challenge with this type of system design is that in many cases it is possible for multiple data types to be stored in one attribute definition. You could end up having dates, numbers, names, codes and any number of different formats all represented in the same attribute fact table but linked back to their master entity. This makes it very difficult to enforce basic data-type quality rules at an attribute level. For example, you can’t enforce uniqueness, completeness or referential integrity rules as you would with a standard system design.

Instead you have to rely on coding constraints into the applications that are using this data store. In my example, this was done badly so it was easy for defects to be added to the new system. When migrating data into the new system we also had major headaches ensuring every conceivable rule was checked, because loading data into this structure bypassed all the standard application logic (we were adding data via the native Oracle connector).

Whilst the system had many elegant design constructs, protecting the quality of data was given little or no thought. Instead, the main goal was to create an extensible framework where new financial entities and attributes could easily be added.

There were also limited defect prevention routines built into the applications that leveraged the underlying abstract data storage model, so enforcing data quality rested on the whim of individual developers. Many application teams could build their own interfaces, so quality soon suffered as a lack of standards meant varying levels of quality control were applied.

So often we treat the causes of data quality but fail to get to the real root of the problem. That is, frankly, terrible design.

Don’t believe me?

Look at those common dimensions of data quality that are so often cited in data quality reports - completeness, validity, consistency, timeliness, synchronisation. These all too often stem from a failure to build quality into the design process. Not only is defect prevention nowhere to be seen, but the underlying data design is destined to create opportunities for defects to emerge.

Over time, every business model will start to shift and change direction, putting greater pressure on underlying system designs. These shifts begin to manifest themselves as data quality issues and come to the attention of data workers and, hopefully, the data stewards and data quality practitioners.

When this situation arises, don’t just focus on building another data quality batch job or manual clean-up project. Look at where the underlying design is failing. Make a note of it and ensure that you’re working with the system design team so these kinds of failings are never replicated. Don’t simply complain about the design standards in your organisation and how they result in poor data quality. Be a force for change and work with them to adopt new standards and improvements.

What do you think? Have you experienced data defects stemming from poor system design? How did you mature the design process to "bake data quality in"? Welcome your views.

Share

About Author

Dylan Jones

Founder, Data Quality Pro and Data Migration Pro

Dylan Jones is the founder of Data Quality Pro and Data Migration Pro, popular online communities that provide a range of practical resources and support to their respective professions. Dylan has an extensive information management background and is a prolific publisher of expert articles and tutorials on all manner of data related initiatives.

Leave A Reply

Back to Top