Filter, format and deliver: Managing data for analytics

0

In the last post, we talked about creating the requirements for the data analytics, and profiling the data prior to load.  Now, let’s consider how to filter, format and deliver that data to the analytics application.

  • Filter – the act of selecting the data of interest to be used in the analytic application.  For example, you may only want customers who have been more than 5 days late with their payment. Or, you may be looking for customers who never use a specific credit card.
  • Format – the act of conforming the data to be used in the analysis.  This may entail adding a default code or verbiage for null or blank columns.  Recoding specific columns based on target source or changing datatypes in the data.  For example, you can cast a specific date format to create consistency for the target analytic application.
  • Deliver – the act of physically providing the data for the analysis.  This data could be a .csv file, xml file, database table(s) or a physical load into the analytics application using a specific import utility or tool.

You basically have to do whatever is needed during implementation to successfully prepare the data. This can include filtering (based on requirements), formatting (based on target application), and delivery (based on intake procedures).

This is something that any company faced with preparing data for analytics will face. One of my clients spent weeks preparing data from multiple sources to load into their analytic application.  The analysis of the data loaded only took three days, but the preparation and formatting of the data took so much longer.

What this company learned – and what many of us have taken away from these projects – is that the quality and integrity of the data is just as important for analysis as it is for any corporate reporting application.  Profiling the input data may help with the preparation of the data, and it definitely would not hurt.  The task of automating this application will be explained in my next post.

 

Share

About Author

Joyce Norris-Montanari

President of DBTech Solutions, Inc

Joyce Norris-Montanari, CBIP-CDMP, is president of DBTech Solutions, Inc. Joyce advises clients on all aspects of architectural integration, business intelligence and data management. Joyce advises clients about technology, including tools like ETL, profiling, database, quality and metadata. Joyce speaks frequently at data warehouse conferences and is a contributor to several trade publications. She co-authored Data Warehousing and E-Business (Wiley & Sons) with William H. Inmon and others. Joyce has managed and implemented data integrations, data warehouses and operational data stores in industries like education, pharmaceutical, restaurants, telecommunications, government, health care, financial, oil and gas, insurance, research and development and retail. She can be reached at jmontanari@earthlink.net.

Related Posts

Leave A Reply

Back to Top