Some years ago I was consulting in a large financial institution. I was brought in to help transfer the company from one financial classification scheme to a new scheme. The project manager assumed this was a three-month project.
He was mistaken.
Nearly six months into the project we realized this was no small feat. The legacy classification had been shared hundreds of times, moving across countless organizational boundaries.
The big problem lay with the way information was supplied by the provider of the classification. The data was emailed or shipped by CD – and so, of course, it became virtually impossible to track the proliferation of its contents across the organization. Some departments were up-to-date with the latest standard; others were far behind.
At another organization, a utilities provider, the company received pricing information from a third-party national supplier each month. Once again, the data would enter the organization, be heavily cleansed and transformed, then distributed across the organization.
The problems with managing data across all these boundaries are varied:
- Conflicting data of the same origin leads to delays in decision-making.
- The cost of tracking the data and maintaining lineage information rises.
- This model leads to duplicated efforts and potentially wasted costs in unnecessary licensing.
I believe data producers have to re-think the way they create boundaries around their data.
We have to get away from thinking of data having an "upstream" and "downstream" journey. There should be no journeys, and hence no boundaries, involved.
The data needs of organizations are becoming far more real time, so data and boundary design has to reflect this. In my examples above, the classification index provider should have given us real-time access to its data. Likewise for the utilities data provider.
By hosting data in this way, you effectively eliminate the flow of data across boundaries. There is just one boundary, the 1:1 relationship with the information supplier.
But here’s the interesting part – boundaries should allow the flow of information in both directions.
One company I interviewed came up with the novel idea of giving incentives to customers so they would send improved data back to it. This company wasn't just pushing data on clients, it was also pulling it back – with improvements. As a result, its overall quality metrics reached Six Sigma levels in a short time.
As a data leader or influencer in your organization, think about your data boundaries:
- Why are they there?
- Is there some political or technical reason for the boundaries that is now outdated?
- How can you get data from source to user in far less "hops" than it currently travels?
Perhaps you’ve already had some success with re-thinking your data boundaries. If so, please share your experiences below.
1 Comment
Pingback: Tear Down These Walls | Liliendahl on Data Quality