In recent years, we practitioners in the data management world have been pretty quick to conflate “data governance” with “data quality” and “metadata.” Many tools marketed under "data governance" have emerged – yet when you inspect their capabilities, you see that in many ways these tools largely encompass data validation and data standardization.
Unfortunately, we sometimes miss the link between the objective expectation of high data quality and the qualification of a data set for its usefulness in achieving a particular policy's or directive's goals. From that standpoint, what is defined as governance is way more comprehensive than just applying data quality and validity rules. You also need to consider how data governance is tightly coupled with all other forms of governance.
As an example, consider the Food and Drug Administration's (FDA's) oversight of compliance with 21 CFR Part 11, which is a regulation associated with overseeing data linage. The regulation specifies that:
“Each specific study protocol should identify each step at which a computerized system will be used to create, modify, maintain, archive, retrieve or transmit source data. … “
This regulation directs governance aspects of data administration: capturing data lineage with specific annotations regarding the different life cycle stages. Observation of the regulatory policy requires observation of specific data policies, but these policies do not imply any kind of data validation or cleansing.
In addition, these data policies require oversight of capturing and documenting data lineage, which is sometimes seen as metadata. However, the governance aspect is not just about capturing the metadata – it also means ensuring that the process for capturing metadata is always followed and that there is an audit trail to prove it.
This introduces some different ideas about how tools can support data governance from the holistic perspective. It is not sufficient to enable the definition of business data quality rules for data validation. Data governance must encompass management of the full life cycle of a data policy – its definition, approval, implementation and the means of ensuring its observance. We will look at how this can be manifested within a tool in upcoming posts.
 FDA, “Guidance for Industry Computerized Systems Used in Clinical Investigation,” see http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM070266.pdf