Data management for analytics: Best practices and examples

0
Two colleagues discuss data management for analytics best practices
Read an article: 5 data management best practices to help you do data right

Looking for some best practices for data management when you’re doing analytics? Experience has shown me that data management best practices should encompass the areas of governance, quality and storage. I’ll share a few examples.

Data governance

The other day I was working on a project in a data warehouse environment where the analytics team wanted to add new data in the semantic layer, but not in the core layer. In a case like this, you need to consider several key questions:

  • How can I rebuild the semantics if the data is not in the core layer?
  • Where does that data come from?
    • How can I get it again?
    • How can I rebuild history if it’s not in the core layer?
  • Is all the data in the semantic layer supposed to have data governance applied to it? And are there corporate data governance policies that include third-party vendor data?

Data quality

This is not the first time that I've dealt with a requirement where data is needed in the semantic layer for a specific report or analysis. For example, let’s say a company purchased data from a vendor. You may want to join that data together with your existing data, but only in the semantic layer of the data warehouse, and only for a specific report (probably joined on geographic area, etc.). Why would you need to bring the data into the core layer of the data warehouse from staging? Some assumptions about the data include:

  • You do not govern third-party data using your policies. This data is for analysis ONLY. (Don't forget to consider what data quality measures will be required on the third-party data.)
  • You must keep persistent history on this data in staging or a file structure if you ever want to rebuild the history in the semantic layer.
  • The data is required for analysis, and no other group will need this data.

Data storage

Another discipline in data warehousing says that all data in the semantic layer must be able to be recreated from the core layer of the data warehouse. This discipline requires more integration and relationship work within the core layer of the data warehouse, but it may pay off later when you want to rebuild the semantics. The goal should be to roll and re-roll semantics any way business users need it. Storing more data will require more work in design, and more space for storage. Data would reside in staging, core and semantic layers of the data warehouse.

Looking ahead

Best practices for analytics reside within the corporate data governance policy and should be based on the requirements of the business community. There will come a time when you must address a requirement like some of those listed above. When this happens, your objective is to be as flexible as possible in meeting business needs quickly, without jeopardizing corporate data governance policies.

Some companies have committees or councils that analyze the data governance requirements of any new data that’s brought into the enterprise. Not a bad idea – as long as it doesn’t create a bottleneck for meeting business needs quickly.

What you don’t want is for the business to bring data into the enterprise without first knowing how to manage, store and use it. It’s as simple as that. Consider today’s world, where data streams constantly. You’ll need to be very flexible about storing and using this data to meet new business requirements. Consider including governance and policies on streaming data now – and be willing to enhance existing policies over time, to meet ever-changing corporate needs.

Download a TDWI Best Practices Report: Data Warehouse Modernization
Share

About Author

Joyce Norris-Montanari

President of DBTech Solutions, Inc

Joyce Norris-Montanari, CBIP-CDMP, is president of DBTech Solutions, Inc. Joyce advises clients on all aspects of architectural integration, business intelligence and data management. Joyce advises clients about technology, including tools like ETL, profiling, database, quality and metadata. Joyce speaks frequently at data warehouse conferences and is a contributor to several trade publications. She co-authored Data Warehousing and E-Business (Wiley & Sons) with William H. Inmon and others. Joyce has managed and implemented data integrations, data warehouses and operational data stores in industries like education, pharmaceutical, restaurants, telecommunications, government, health care, financial, oil and gas, insurance, research and development and retail. She can be reached at jmontanari@earthlink.net.

Related Posts

Leave A Reply

Back to Top