Traditional data management includes all the disciplines required to manage data resources. More specifically, data management usually includes:
- Architectures that encompass data, process and infrastructure.
- Policies and governance surrounding data privacy, data quality and data usage.
- Procedures that manage a data life cycle from creation of the data to sunset of the data (or data assets).
That said, how do we incorporate new technologies like Hadoop with our traditional data management disciplines? This series will address how enterprise metadata can help bridge the gap with our big data solutions. Let’s take a look at how it works.
Source system metadata provides the following information:
- Source system usage and definition.
- Database and schema.
- Table and column.
- Relationships between tables.
- If you don't have database-managed foreign keys, most data modeling tools can imply referential integrity when you reverse engineer.
What does that give our big data solutions? As we move big data into our mainstream processing, source system metadata plays a central role. For example, it can help us with:
- Database structure information (used as source information in a tool or program to consume the data for big data).
- Column cardinality (used for understanding NULLS and the business use around the NULLS).
- Business rules – these are implied in the relationships between tables (which allows us to understand how the business uses the data).
You can use data profiling products to understand the quality of the data prior to consumption. You can also use these tools to design and maintain the policies within your governance initiative.
Big data platforms may be cheaper than traditional platforms, but work surrounding the design and the sources is still required. This work can also be quite time consuming. Keep in mind that our policies and governance may need to be tweaked for big data – but they should never be ignored.