I'm frequently asked: "What causes poor data quality?" There are, of course, many culprits:
- Lack of a data culture.
- Poor management attitude.
- Insufficient training.
- Incorrect reward structure.
But there is one reason that is common to all organizations – poor data architecture.
The (inherited) data architecture problem
Most organizations have some degree of duplication in their data assets; this always stems from badly conceived system architecture design. I don't envy the role of CIO and IT leader, constantly having to transform the historical IT landscape they inherited from predecessors.
At one company, I discovered 15 systems independently storing facilities management data. The organization was, in effect, managing 15 variations of the same physical asset. Speaking to the head of IT architecture confirmed what everyone could see: "No one designed this, we were just given it."
The problem is that we have constructed our information systems around an assembly line approach. Each business unit creates its view of the world, complete with the functions and data it requires. This silo mentality means that data flows from one function to another, and from system to system. As a result of this interconnectivity, data defects can quickly spread and proliferate across the organization like a virus.
Another problem is that these core master data assets quickly become out of sync due to the discrepancies between systems. As a result, users struggle to complete business functions in an optimal way as they are frequently grappling with defects.
Tom Redman summed it up perfectly in a recent interview when he said:
The way our organizations are set up today is wrong for data. Companies are built around a division of labor, with an assembly line mentality. In the Industrial Age, this was remarkably effective. But now, these silos prevent data sharing.
So why hasn't the "data architecture problem" been solved?
There are many reasons why these issues persist, but some of the most common are:
- Business silos design their view of the world without reference to any central, enterprisewide strategy.
- Lack of strong IT governance to ensure a consistent system design and deployment approach.
- Delays in IT project delivery that cause the business to build tactical solutions.
- "It ain't (really) broke, so let's leave it alone."
In short – poor enterprise strategy leads to poor design, which leads to poor quality data.
What should your IT and data leaders be doing differently?
The obvious starting point is to have a central strategy for your master entities, the fundamental building blocks of your business. You need to build an enterprise plan for master data.
As Larry English commented in a past TDAN interview:
Real master data management tells us we must design databases around the fundamental resources, such as customer (party), product, financials, facilities, equipment, etc. These must be defined in singular enterprise-strength information models about each discrete resource.
The benefit of applying real master data management is fairly obvious: you're significantly reducing the cost and complexity of managing the same data across hundreds of locations in the business.
Larry continues:
Building redundant and disparate databases is like paying an invoice multiple times, with each and every invoice failing to solve the “enterprise” problems. I have never found a CFO who condoned paying a single invoice multiple times. Why does IT insist that redundant databases are a best practice?
With data mastered centrally and federated to the wider organization, you can start reducing the complexity in your information chains. By reducing the amount of master data moving around the organization, you are guaranteeing a reduced defect rate due to the reduction in information handover points where translation issues inevitably creep in, not to mention the synchronization problems. The long-term strategy should be to deploy new applications that can access and maintain master data while delivering their individual business functions.
Your goal should be to have no overlap between master data sets, combined with accurate, timely information across the company.
What else is happening to help eliminate the Industrial Age mentality toward data?
I believe we are at a turning point in the data sector, and data leaders now have to make some critical decisions. No longer can they simply maintain the status quo of legacy data strategy. If they don't innovate and change, they will be usurped by younger, leaner businesses that are driven by customer-centric models and far lower operating costs.
In terms of drivers that are helping change the situation, here are some obvious ones:
- Data governance: The growth of data governance has been impressive, and frameworks like DMBOK place data governance at the center of data strategy. I believe this is right – only with strong governance can we prevent the disjointed data architectures found in the majority of data-heavy businesses.
- Data quality management: By applying real master data management, our job as data quality practitioners becomes far easier. But, of course, there will always be the need to implement root-cause prevention and "bake" data quality into your data architecture.
- Master data management (MDM): At the moment, we mostly see tactical solutions for MDM. Organizations are creating hubs to reduce the amount of duplication taking place. We need to move beyond tactical solutions to fully architected frameworks where data is truly mastered and federated to wherever required.
- Greater awareness of data as an asset: This is the buzz-phrase of the moment, but there is certainly a sense that executives are waking up to the benefits of decision making that's driven by data. Ironically, applying real MDM is a perfect enabler – because it prevents duplication and inconsistencies, helping to provide far greater trust in analytics and reporting.
- Big data / PaaS / cloud apps: With cloud-based platforms, it has never been easier to centrally manage data and deliver federated services across the organization, and indeed the wider customer ecosystem.
- Regulatory compliance: With the increased regulatory demand for initiatives such as single customer view, "privacy by design" and more robust data governance and data quality measures, many organizations are waking up to the fact that it is far better to architect the right data strategy as opposed to meeting each compliance directive on a wing and a prayer using Excel and monumental information chains.
- The Internet (of Things): There is no doubting that the Internet is radically transforming conventional business models, and nowhere is this more apparent than with the meteoric rise in IoT we're about to witness. IoT has profound impacts for Data Quality Improvement but this will only be realized through an MDM strategy.
What steps should you take next?
To be fair, this is long-term strategy stuff. It's not the usual tactical data quality improvement I talk about here on the Roundtable.
For a lot of firms, this will require a total re-think of the way they do business and manage their data. It's also the reason why new startups – for example, those in the banking sector – are able to aggressively compete with entrenched mega-firms. They are building out an IT and data landscape that is far more closely modeled to their business. As a result they're able to build with quality and master data management in mind, right from the outset. Here are some obvious starting points:
- Identify your master data sets:
- Customers.
- Assets.
- Partners.
- Products.
- Equipment/Assets.
- Facilities.
- Understand which business functions are driven by these master data sets.
- Identify where tactical MDM can be replaced with real MDM.
- Specify policies for new system development:
- Eliminate silos.
- Manage core data assets at an enterprise level.
- Make sure your data quality processes for ongoing management and control remove duplicates at the source.
- Focus on information chain reduction.
- Develop a central strategy on data consolidation and modernization, with a focus on real master data management.
What do you think about this topic?
I appreciate that this is a little different than the shorter-term "do this now" type of post I usually write, so please share your views in the comments below. Perhaps you've been on a similar journey? I'd to love to hear about your experiences. What's missing from this article?
1 Comment
What causes poor data quality? The answer is easy: humans. Humans design poor business processes, inadequate data design, poor data models and poor processes for managing data. The solutions to data quality however are not easy. They are not solved by technology. They are a human design problem. Humans are ambiguous creatures and seem content working in a world of ambiguity. Machines however, cannot deal with ambiguity. And this is the real gap:between man and machine.
We see the same complaints about the proliferation of silos and the lack of data architecture and data governance and the need for MDM and data quality yet over the past 20 years we see little improvement in data. Quantities of data increase, sources of data increase but we continue to apply the same data modeling techniques we used when we created these silos of poor data 30 and more years ago. Data has never been architected and current practices of data modeling are the equivalent of brick laying.
The solutions suggested remain the same as well. "Customer centric" solutions, breaking down silos, integrating data, data architecture, and bridging the business and IT gap etc. haven't worked.
Data is not a physical resource. Data is a language of communications. IT practitioners are ill equipped to manage data communications and their solutions are sorely inadequate. As a language, data requires Data Literacy; syntax, semantics and pragmatics. Organizations would be wise to hire linguistics, librarians and sociologists to improve data, rather than relying on technologists. Data is NOT a technology domain. Solutions such as Socio Technical Systems design and other non IT practices should be adopted rather than trying to solve the problems of human communication through technology alone.
If your organization is not proficient in syntax, semantics and pragmatics and if your staff do not use techniques such as ontology and taxonomy, then the organization remains data Illiterate and no technology will improve the data.
The proliferation of data, data silos and lack of data architecture will remain. Natural language evolves unbounded by rules and technology. Data is a natural language. Its time to accept this and develop techniques to help improve Data Literacy rather than wasting time trying to “master” data! Time to think outside the database!