Data management for analysis – Feeding the analytical monster more than once

1

(Otherwise known as Truncate – Load – Analyze – Repeat!)

After you’ve prepared data for analysis and then analyzed it, how do you complete this process again?  And again? And again?

Most analytical applications are created to truncate the prior data, load new data for analysis, analyze it and repeat the process as required by analytics users.

Truncating the data in an application may be as easy as truncating a few tables, or it may entail a more sophisticated means of getting rid of data from the prior analysis (it depends on the software). The assumptions are that the analytics software has been installed and that the user is someone who knows the ins and outs of the tool. In any case, it’s highly recommended that this process be as automated as possible.

In an earlier post, we talked about how extremely time consuming data preparation processes can be. Look for ways to automate these processes. Especially if multiple sources must be joined for the analysis. If you profiled the data, it should be easier to know which data quality issues to look for the second time you’re preparing for load. (Remember: Some data sources ALWAYS have bad data. So be ready and know those sources!)

Repeating the process will be so much easier if you automate as many of the preparation and load processes as possible. I recommend documenting all the processes for future use. Consider a simple Word document or a spreadsheet – but whatever you choose, please write it down. Why? The analytics group may want to change a source or some attributes the next time around, and in that case your notes will come in very handy. Or, what if you had to hand off this process to a summer intern or to production support? It would be so much easier to do that if you had documented the processes.

To make things even easier to repeat: Consider documenting all the quality and data anomalies you found during the first load.

Share

About Author

Joyce Norris-Montanari

President of DBTech Solutions, Inc

Joyce Norris-Montanari, CBIP-CDMP, is president of DBTech Solutions, Inc. Joyce advises clients on all aspects of architectural integration, business intelligence and data management. Joyce advises clients about technology, including tools like ETL, profiling, database, quality and metadata. Joyce speaks frequently at data warehouse conferences and is a contributor to several trade publications. She co-authored Data Warehousing and E-Business (Wiley & Sons) with William H. Inmon and others. Joyce has managed and implemented data integrations, data warehouses and operational data stores in industries like education, pharmaceutical, restaurants, telecommunications, government, health care, financial, oil and gas, insurance, research and development and retail. She can be reached at jmontanari@earthlink.net.

Related Posts

1 Comment

Leave A Reply

Back to Top