In the first blog of this four-part series, we discussed traditional data management and how we can apply these principles to our big data platforms. We also discussed how metadata can help bridge the gap of understanding the data as we move to newer technologies. Part 2 will focus on transformation and movement metadata.
We move data – no doubt about that. We redundantly store, and transform, to various data stores across the enterprise. WHEW – now that we've gotten that out of the way, let’s talk transformation and movement metadata. What do we gain from transformation and movement metadata?
Sometimes we encounter multiple systems that do the same thing (think insurance and claim engines). Transformation and movement metadata can help. Why? Because:
- This metadata allows us to understand the match, merge and integration rules that take place during a process.
- Whether we use a data quality tool and/or an ETL tool, this technical metadata enables us to understand how the data may be manipulated surrounding specific quality issues. For example, technical metadata helps us understand the rules around address standardization.
- With this metadata, not only can we understand what we do to the data, we can understand when it happens – and we'll be aware of the many data objects that are affected by the process.
- Transformation and movement metadata allows us to understand lineage and objects that are impacted by change (impact analysis).
Our big data initiatives cannot possibly just dump in data and expect every person that accesses the data to deal with quality, duplication and other issues. Or can it? It doesn’t make sense, to me, to move data without understanding any transformation or movement issues. With transformation and movement metadata, we can add to big data the correct rendition of the data – we can transform once, and use many. On the other hand, we may drop the data once and transform many times to complete our analysis.
Depending on your unique business requirements, you need to decide when and how to transform your data. Use transformation and movement metadata to help guide your decision.