Reference data lineage

There are really two questions about reference data lineage: what are the authoritative sources for reference data and what applications use enterprise reference data?

The criticality of the question of authority for reference data sets is driven by the need for consistency of the reference values. In the absence of agreed-to authoritative sources, there is little or no governance over the sets of values that are incorporated into different versions of the reference domains. The impact is downstream inconsistency, especially in derived information products such as reports and analyses.

For example, reports may aggregate records along reference dimensions (especially hierarchical ones like product categories or geographic locations). If there are different versions of the hierarchical dimension data (sourced from a reference domain), there will be differences in the derived reports, potentially leading to confusion in the boardroom when the results of those reports are shared.

The second question is one of dependency. It is important to know which business processes, applications, and data artifacts depend on the reference data sets. If there are changes to business policies that impact the make-up of the reference domain, it is critical to know how those changes propagate across the enterprise.

Both of these questions are related to data lineage, either tracking the lineage of data from its original source to the organization’s reference data repositories, or from the reference data repositories to the corresponding uses. This highlights the prominence of lineage as part of any metadata strategy, especially for reference data.

Blogs

Blogs

Reference data lineage

About Author