Data quality in medias res


The planning and execution of enterprise information initiatives is definitely not easy. Building the business case involves identifying, documenting, verifying and refining a set of requirements that are representative of the various perspectives of the business and technical stakeholders all throughout the organization.

Many such initiatives begin with the very best of intentions, and sometimes with a grand vision of delivering a data-driven solution to every business problem. But then the sobering reality of limited resources, especially financial resources, forces the practical compromises of selecting what can be delivered in a reasonable timeframe and within a reasonable budget. Therefore, some requirements get deferred to future phases of the initiative, and the scope of the initial deliverable gets established.

As the work begins, the design and development of the technical architecture is coordinated with the gathering of source data from the relevant business processes. Some data profiling is usually performed, but only to obtain high-level statistics and characteristics about the data for comparison against basic expectations.

At this point, the data perspective is focused on structural and relational integrity. In other words, was the expected record count received from the source system? Was the provided metadata accurate for mapping the data to the target system? Were the required data fields populated, especially the key fields for joining related sources?

Unfortunately, it is common that it isn’t until the middle of the development timeline, after considerable effort has already gone into the implementation work for the initial deliverable, when someone asks:

“Shouldn’t we be concerned about the quality of the data?”

This common mistake is data quality in medias res — where data quality concerns begin in the middle of an enterprise information initiative — and not at the beginning.

As a literary, television and movie technique, in medias res can often be a great method for immediately pulling the audience into the story, usually by throwing them into an opening scene of fast-paced action and drama.

This is why an action movie rarely opens with the hero sitting through an extensive safety briefing about why high-speed driving and highly flammable chemicals could be a potentially dangerous combination. Instead, the movie opens with the hero sitting in a high-end sports car speeding along a winding highway until he loses control while crossing over a bridge high above a river, crashes into a giant fuel truck, and a massive fireball erupts high into the sky above, ripping the bridge apart and raining torrents of fiery pieces down into the raging river below.

Although half the movie’s budget was probably spent on that opening scene, it was worth it because the audience is on the edge of their seats wondering — as the movie now flashes back in time to the beginning of the story — why would the hero do that?

However, as an enterprise information initiative technique, data quality in medias res is only a great method for guaranteeing your eventual — and catastrophic — failure. Don’t wait until you’re in the middle of your development efforts to begin thinking about data quality, when you’re perhaps past the point of no return, likely forced to just get the data loaded as is in order for you to deliver on time and on budget.

It might not feel like you’re setting off a massive fireball that will rip your initiative apart when data quality is ignored until it’s too late. However, I have seen that movie before. Trust me; you’re not going to like how it ends.


About Author

Jim Harris

Blogger-in-Chief at Obsessive-Compulsive Data Quality (OCDQ)

Jim Harris is a recognized data quality thought leader with 25 years of enterprise data management industry experience. Jim is an independent consultant, speaker, and freelance writer. Jim is the Blogger-in-Chief at Obsessive-Compulsive Data Quality, an independent blog offering a vendor-neutral perspective on data quality and its related disciplines, including data governance, master data management, and business intelligence.

Leave A Reply

Back to Top