What are the principles of data management for analytics? It may depend on who you ask. As I see it, there are five key things:
- Data management strategy.
- Ownership versus stewardship of the data.
- Metadata strategy.
- Data governance, including data quality and data life cycle.
- Data usage.
With these five subjects in mind, let’s apply them to business and data analytics.
Data management strategy
When using data for analytics, consider the corporate data management strategy (if there is one). Consider the procedures outlined in the strategy and adhere to them. For example, if the analytic project is to bring in data from a third-party vendor (purchased data or data scraped from a website), the data should be introduced into the corporation based on the data management strategy. Each strategy should outline how to deal with purchased data. The strategy could include a meeting with the board to introduce the data and to recommend how to govern and manage the data over time. The data management strategy should answer questions like:
- Where did the data come from?
- Will we get more of this data periodically?
- Should we keep metadata about this data?
- How will the data be used?
- When does this data become irrelevant?
Ownership versus stewardship of data used for analytics
With analytics, we use data from many places to help us make business decisions. But who owns the data? Some companies have created data domain managers for subject areas like customer, product, etc., where those stewards basically "own" the data. But much like my husband "owning" the television remote, we must state ownership for every project. Be sure to state clearly which domain steward or subject matter expert (SME) owns the data.
Metadata collection, storage and dissemination for analytics
Sometimes in analytics we build so quickly that we don’t have time to think about metadata. Metadata tells us the navigation, structure, usage and definition of data. It also tells us where the data came from, frequency of updates, etc. Storing this information for data life cycle management is very important to the organization. Metadata can answer questions about when the data was last used, as well as when to retire specific analytical data. Much like clothing you haven't worn for a while, you may not need the data anymore it it has not been queried in the last three months.
Data governance for analytics
Data governance, as part of a data management strategy, relies on the metadata listed above as well as the quality and life cycle of the data. Keep in mind that:
- The quality of the data is based on the requirements. Some analytical projects require data in its rawest form (no enhancements, no corrections).
- The data life cycle is also based on requirements. Some data or compilations of data are only required for a certain amount of time. The historical perspective of the data is just not important. That said, for any data coming into the enterprise, we need to understand its life cycle.
Use of data analytics
Analytics is all about usage. We create the data in a format for the business user, based on their requirements. Here's an example. I was working for a company that had a huge investment in a reporting and analytics tool. This tool had metadata stored about every query, every attribute touched or used by any individual or process. By analyzing this stored technical metadata on a monthly basis, we could determine what data and reports were no longer used. Then, based on our analysis, we contacted business users and determined what data and reports could be eliminated. It's good practice not to keep data you don't need anymore – and should be one of the foundations of the larger enterprise data strategy.
Ready to learn more? Download 5 Data Management for Analytics Best Practices