The first step is establishing governance for reference data is assessing the existing reference data landscape: understanding what reference data sets are used, who is using them, and how they are being employed to support business processes. That suggests a three-pronged approach to identifying organizational business process and application dependencies on reference data domains.
Two of these prongs are empirical, involving analyses to find reference data domains (embodied via code value sets or enumerated inline inside programs) and then figuring out what they represent. The third prong involves engaging the business users to solicit their input.
- Empirical data evidence: This might include profiling the data sets to find data elements that are populated with value sets that exhibit the characteristics of reference data, such as frequently-used value sets with limited ranges or collections of values. After identifying these candidate values sets, review the data element metadata to assess whether the naming and any associated documentation or definitions indicate use of reference data.
- Empirical code evidence: Reviewing application code for branching statements (such as “case statements” or “switch statements”) that enumerate code lists that represent reference data domains.
- Engaging users and soliciting their feedback: This is probably the most useful, as it will expose the intended and expected uses of the reference domains and provides the data management professional with the opportunity to tease out specific definitions of the reference concepts.
Here is a basic outline of the initial process for soliciting and documenting metadata about reference data as the prelude to implementing governance:
- Survey business applications to identify points in which reference data domains are used.
- Provide a draft characterization of the reference domains that are used, including a proposed name, definition, conceptual domain, and list of values.
- Create an inventory of reference data concepts used as a repository for the proposed reference data metadata.
- Instantiate a framework for capturing that reference metadata.
- For each reference data set
- Provide a conceptual domain definition.
- List the value meanings for members of the conceptual domain
- Document the values used in the value domain.
- Provide a mapping of permissible values (value to value meaning).
- Validate with the business users.
- Commit an agreed-to version of the reference data metadata.
The objective of this process is to build the catalog as a foundation for further analysis. In upcoming posts we will look at the subsequent stages of this process.