In the last post, we talked about creating the requirements for the data analytics, and profiling the data prior to load. Now, let’s consider how to filter, format and deliver that data to the analytics application.
- Filter – the act of selecting the data of interest to be used in the analytic application. For example, you may only want customers who have been more than 5 days late with their payment. Or, you may be looking for customers who never use a specific credit card.
- Format – the act of conforming the data to be used in the analysis. This may entail adding a default code or verbiage for null or blank columns. Recoding specific columns based on target source or changing datatypes in the data. For example, you can cast a specific date format to create consistency for the target analytic application.
- Deliver – the act of physically providing the data for the analysis. This data could be a .csv file, xml file, database table(s) or a physical load into the analytics application using a specific import utility or tool.
You basically have to do whatever is needed during implementation to successfully prepare the data. This can include filtering (based on requirements), formatting (based on target application), and delivery (based on intake procedures).
This is something that any company faced with preparing data for analytics will face. One of my clients spent weeks preparing data from multiple sources to load into their analytic application. The analysis of the data loaded only took three days, but the preparation and formatting of the data took so much longer.
What this company learned – and what many of us have taken away from these projects – is that the quality and integrity of the data is just as important for analysis as it is for any corporate reporting application. Profiling the input data may help with the preparation of the data, and it definitely would not hurt. The task of automating this application will be explained in my next post.