Data lineage means many things to many people but it essentially refers to provenance - how do you prove where your data comes from?
It’s really a simple exercise. Just pull an imaginary string of data from where the information presents itself, back through the labyrinth of data stores and processing chains, until you can go no further.
I’m constantly amazed by why so few organisations practice sound data lineage management despite having fairly mature data quality or even data governance programs. On a side note, if ever there was a justification for the importance of data lineage management then just take a look at the brand damage caused by the recent European horse meat scandal.
But I digress. Why is data lineage your secret data quality weapon?
The simple answer is that data lineage forces your organisation to address two big issues that become all too apparent:
- Lack of ownership
- Lack of formal information chain design
Lack of Ownership
As you examine data lineage, you are in effect unraveling the life story of how that data was created. You’ll find that the data you believed was owned by Marketing actually has its root in Billing. The financial data you send to the CFO is also processed by compliance and so on.
By building up a complete picture of where your data comes from, it’s far easier to understand the true ownership model of an organisations data.
Lack of Information Chain Design
I once implemented Information Chain Design in a company that believed it had three sources of plant equipment data. We had to stop counting and go back to the drawing board when the 30th system was uncovered.
The reality is that you will never improve data quality if you don’t simplify and improve your information chains. You need to measure, monitor and prevent data defects entering and flowing through information chains. To do this you need to start with the obvious question - what information chains do we have. Be prepared for some surprises.
In Summary
There are many other benefits of sound data lineage management but ownership and information chain improvements are a great starting point.
What’s more, data lineage can be undertaken in a simple spreadsheet initially and then coordinated in a more advanced tool at a later date as you start to recoup benefits to the organisation so there really are no barriers to getting started.