Struggling to get started with data quality? Start with data lineage


Many people don’t know where to start with data quality. They get bogged down with questions on dimensions, ownerships, rules and tools. The problem can seem too vast to even begin making sense of their data landscape, let alone transforming it into a well-governed and high-quality asset.

A lot of data quality initiatives start with one person who has the drive but may lack personnel to support them. Tools and technology may not be freely available until value has been demonstrated. I’ve been in this situation several times in my career and found the best place to start is always the lineage of your data, because it lays the perfect foundation for developing a data governance and data quality management framework everyone can buy into.

Data lineage is one of those techniques that anyone can learn to master overnight. It is more an exercise in persistence and stubborness as opposed to any methodology-based technique.

The first step is to speak with your business users and identify which critical data elements drive the core business functions. For example, customer data, product data, billing data and contract data may be critical to your organisation.

Take one data set and start to build a lineage path, tracing back to the source of each data element. You will find a single subject area (e.g. customer) is sourced from a variety of locations.

For example, customer databases may be populated by customer-populated web forms, externally purchased third-party lists, trade show events, mailshot response cards and telephone inquiries.

Create a simple spreadsheet and start to uniquely label these sources (applying known standards if they exist). You can then link them together and begin the process of identifying ownership and accountabilities. By speaking to owners and data workers, you can establish some initial data quality levels since everyone has an opinion on how much they trust a data set.

Start to build a large map of this landscape using tools such as Visio, LucidChart or Gliffy. You can create a different map for each data subject area (e.g. customer, product) or create a map for each system, tracing back its source data. It depends on your area of focus.

To supplement your skills in this area, I would strongly recommend you purchase a copy of Enterprise Knowledge Management: The Data Quality Approach by fellow Roundtable contributor David Loshin. It contains an excellent section on information chain management.

Your data lineage map will grant you a vital insight to a hidden world underpinning your most vital business functions. Beginning this discovery process will help you uncover known problems and root causes.

Importantly, the map becomes a communication tool and helps you to begin the critical task of cultural change towards data. The map should be simple to understand and convey meaning, both from technical and business viewpoints.

By explaining to people the vital role they play in the journey of data as it flows through their team or application, you can start the process of changing hearts and minds with the way people approach data quality.

What do you think? Have you created data lineage maps? What process did you use? Was it a useful exercise? Please share your views.


About Author

Dylan Jones

Founder, Data Quality Pro and Data Migration Pro

Dylan Jones is the founder of Data Quality Pro and Data Migration Pro, popular online communities that provide a range of practical resources and support to their respective professions. Dylan has an extensive information management background and is a prolific publisher of expert articles and tutorials on all manner of data related initiatives.

Back to Top