As I discussed in the first two blogs of this series, metadata is useful in a variety of ways. Its importance starts at the source system, and continues through the data movement and transformation processes and into operations.
Operational metadata, in particular, gives us information about the execution and completion of a process. With operational metadata, we can answer the following questions:
- When is this process executed?
- Examples of attributes that may be included are load date, create date, update date and delete date.
- How many times in the past was this process executed?
- How many times did the process fail?
- How many times did it execute successfully?
- What were the number of security violations?
- What were the numbers and types of data accesses?
- What are the disk space usage or other indicators that suggest we are running out of space?
Not only does this information give us a good understanding of how things are running, it can quickly inform us when things are going wrong.
Our big data initiatives should consider this information when consuming enterprise data. It is so much faster to look at the types of metadata listed above – which is, hopefully, published somewhere for enterprise view – than to go find it for yourself again.
Here are questions I would ask, and some of the information I would look for:
- Where is the data source for the process, and is this the right source for big data to consume from?
- What is the schedule of updates? Because you need to make sure you consume the data from a source at the appropriate time (after updating it, etc.).
- From the security access report, you can see which business people are consuming or accessing specific data stores. Are they the strategic enterprise sources of data?
Clearly, we have quite a bit of this information available to us. All of this should be part of the big data initiative for consuming enterprise data.