The second part of my data governance primer series addresses ways to "mind your metadata." I can just hear the collective groans, and perhaps a stifled yawn. Sorry, but metadata collection is one of those necessary evils that may not be fun, but having it available as a resource to understand your data and use it appropriately is invaluable. And you just might find some interesting surprises along the way!
Metadata: What Is It & Why Do I Need It?
As you start your root cause analysis, you need to examine existing data definitions (or lack thereof). Metadata is the foundation of good data management and forms the basis for data governance. This may be an obvious statement, but metadata is fundamental to investigating and resolving data issues, and it is the first place to start when investigating data quality issues.
Metadata is “data about data." Plain and simple. It includes descriptive information about electronic data used in common daily business practices. Metadata includes items usually found in a data dictionary: field name, field length, retention rules and security access, as well as additional descriptive information that may include data origin (source or system), creation/entry date, method of creation (key-entry or the result of a calculation), purpose of the data (intended use), how frequently it gets updated or refreshed, and current location in a database (table, view, schema). If a data element is the result of calculation logic or groupings (such as age categories), those business rules used to generate the resulting data values should be collected as part of the metadata.
A good example of metadata that you may use every day would be "document properties" in a Word document. This feature captures data on the original document creation date, most recent access and update times, document creator, count of characters, words and pages. If the document should be private, this will be indicated in its properties. You may also tag the document by indicating key words in order to make it easier to find by you or others.
What are some of the benefits of metadata management?
- Clarifies rules for data entry
- Reduces ambiguity around appropriate use of data elements
- Eliminates problems associated with not having data definitions, business rules or transformation logic available
- Validates legitimate values at the data element level
- Provides evidence to regulators that security and confidentiality are protected
- Centralizes the storage and accessibility of metadata for end-users
- Reduces the amount of effort required to research data results.
A Metadata Management Repository is a central location or system to collect and store metadata that may exist in disparate parts of the organization (data dictionaries, systems, spreadsheets, or people’s brains). The metadata repository will store detailed definitions centrally on a network where other users can find it.
There are three general sources of metadata that should be included in this repository:
- Business Metadata – Facilitates identification, understanding and appropriate use of existing data elements. These include clear business names and descriptions, relevant business rules, descriptions of the data sources, security and privacy rules, etc.
- Technical Metadata – Describes the technical attributes of data such as physical location (host server, database server, schema, etc.), data types, any transformations applied as well as the domain of valid values, relationships to other data elements, precision, and lineage. Technical metadata is used by business users and IT staff to design efficient databases, queries and applications – and to reduce duplication of data.
- Operational Metadata – Describes the attributes of routine operations on data and related statistics. These include job schedules and descriptions, data movement and transformation processes, data read, update and performance statistics, volume statistics, backup and archival information. Operational metadata is used by operations staff and DBAs to tune the system and ensure its continued efficient operations. It is also used by business users to track such events as “last use” of a field and “last load” of a data element.
Exciting stuff? Maybe not, but the whole point of metadata is to have the information about data available to a multitude of users when they need it, to keep it current and to avoid confusion around usage. So if you appreciate having a clean house – and knowing where you keep your vacuum cleaner – you will also appreciate having good metadata!