In my prior posts about operational data governance, I've suggested the need to embed data validation as an integral component of any data integration application. In my last post, we looked at an example of using a data quality audit report to ensure fidelity of the data integration processes for loading data from a variety of sources into a data warehouse.
As it becomes apparent that the definition of data quality is in the eyes of the downstream data consumers, it's important to note the implications for data validation:
- There may be multiple versions of “validity.” When data use is context-driven, what is valid for one data user may be irrelevant for others. Yet each data consumer is entitled to a view that conforms to the validity rules associated with the business process.
- There may be different levels of severity for invalid data. In some cases invalid data is logged but still loaded into the target data warehouse. But in more severe situations, the data integration process fails and the invalidity has to be remediated before attempting to restart. There's a wide spectrum between those two extremes. But one common theme is the need to specify the types of invalidities, their severity, who should be notified, and what needs to happen when the issue is identified.
- All of these specifications need to be operational simultaneously. This third implication suggests a greater level of complexity than the first two. Although different data consumer communities may have different expectations, you cannot focus on one community while ignoring the others. There must be a business directive to guarantee that all data validation rules will at least be monitored, if not assured, at the same time.
- There must be ways to remediate invalid data without creating inconsistency. This last implication may be counterintuitive – but if we are willing to allow different constituencies to define their own expectations, it's not unreasonable to have different sets of expectations that clash with one another. Yet observing our prior implication (above) suggests that managing against the introduction of inconsistency must become a priority for ensuring enterprise data usability.
These four implications require you to do more than just introduce methods for monitoring data quality and validity rules and generating alerts when an issue is identified (although that is definitely necessary). From a holistic, systemic standpoint, each set of data consumers defines a set of data validity policies. Those policies embed a number of data quality rules for establishing data usability for their particular purpose. In turn, the roles of operational data governance practitioners – notably data analysts and stewards – incorporate a variety of tasks. These include:
- Managing the policies.
- Reviewing how the rules in each policy coincide with those of other policies.
- Determining how to integrate validation tasks within end-to-end data flows.
- Establishing methods for generating alerts when issues are identified.
- Responding to those alerts.
A more sophisticated approach to operational data governance
Operational data governance combines aspects of data policy management with data policy operationalization. This represents a deeper engagement within the data governance practice that goes way beyond the theatrics of setting up a data governance council and introducing data profiling tools. A more sophisticated approach will engage different pools of data consumers to better design and build data validation procedures directly into the application landscape.