(Otherwise known as Truncate – Load – Analyze – Repeat!)
After you’ve prepared data for analysis and then analyzed it, how do you complete this process again? And again? And again?
Most analytical applications are created to truncate the prior data, load new data for analysis, analyze it and repeat the process as required by analytics users.
Truncating the data in an application may be as easy as truncating a few tables, or it may entail a more sophisticated means of getting rid of data from the prior analysis (it depends on the software). The assumptions are that the analytics software has been installed and that the user is someone who knows the ins and outs of the tool. In any case, it’s highly recommended that this process be as automated as possible.
In an earlier post, we talked about how extremely time consuming data preparation processes can be. Look for ways to automate these processes. Especially if multiple sources must be joined for the analysis. If you profiled the data, it should be easier to know which data quality issues to look for the second time you’re preparing for load. (Remember: Some data sources ALWAYS have bad data. So be ready and know those sources!)
Repeating the process will be so much easier if you automate as many of the preparation and load processes as possible. I recommend documenting all the processes for future use. Consider a simple Word document or a spreadsheet – but whatever you choose, please write it down. Why? The analytics group may want to change a source or some attributes the next time around, and in that case your notes will come in very handy. Or, what if you had to hand off this process to a summer intern or to production support? It would be so much easier to do that if you had documented the processes.
To make things even easier to repeat: Consider documenting all the quality and data anomalies you found during the first load.
1 Comment
well written article - love it! Thanks