IT folks love SQL (Standard Query Language). Once you know how to program in SQL, you can work with almost any database because it is a standard. However, SQL is NOT a standard for doing analytics. The SAS programming language pre-dates SQL and even though SAS does SQL, SQL does not do SAS.
SQL is a programming language designed for managing data held in a relational database management system. SAS is a programming language designed to access data stored in any format, prepare it for analysis, perform analytics, and deliver insights and results based on this analysis.
Why is this important? Because when you move into the analytics world, SQL alone will just not cut it.
The decision to use "standard" ETL tools (which are SQL based) to support analytic environments and analytically derived data sources causes many headaches for IT, for business and ultimately for the organization as a whole. It can also limit what type of analytics can be used effectively and raises the overall risk of failure for analytic based initiatives or projects. In fact, one of the reasons "shadow IT" programs crop up in business units is because the data scientist types understand analytics preparation and exploration means more than what SQL has to offer.
Why is the typical ETL tool not the best choice for preparing data for analytics exploration and analysis? Let's look at a typical scenario.
IT uses their standard ETL tool to pull data from various sources to surface the data to their analytics team. This provides an initial "development" analytics data warehouse (ADW) for the analytics team. Then the analytics team uses analytical transformations and analytical data prep to stage a sandbox and what I will call analytical base tables (ABTs) that are used as the input for developing new insights, testing, and creating predictive models, forecasting models and optimization models.
Once a new insight has been discovered or a model has been developed, it must then be deployed into production systems for it to provide any real business value. For these analytical assets to work, they need the proper input values that reside in the analytical base tables. In other words, the necessary inputs reside in the ABTs which are located in the analytics environment, NOT in the initial ADW that IT provided.
This is why IT often struggles with getting analytics deployed because they are trying to use analytics with missing pieces.
Then, IT spends time and resources trying to understand what the analytics team did with the ADW data so that IT can duplicate it in, you guessed it, SQL. If IT used an analytics ETL tool instead, the organization would improve its overall data governance, auditability and provide a more streamlined process for deploying analytics into production. Using an analytics ETL to move the data from the analytics environment or the analytical base tables to a test and production "new" analytical datawarehouse, which is now the most efficient way to streamline deploying or operationalizing analytics.
To recap, recall my post describing the difference between business intelligence (BI) and business analytics (BA), and then consider these three data flow diagrams.
- This is a data flow that supports just business intelligence:
ETL from data sources --> dev(EDW) --> typical BI reporting --> test(EDW) --> prod(EDW) --> reports/end user consumption
- This is the data flow that IT believes will support business analytics:
ETL from data sources --> dev(EDW) --> ETL --> dev(ADW) --> analytic exploration/discovery --> ETL --> test(ADW) --> prod(ADW) --> reports/end user consumption
- This is an ideal data flow that should be used to support business analytics:
ETL from data sources --> dev(EDW) --> ETL to create dev (IT's ADW) --> analytic ETL --> dev(ABTs) analytic exploration/discovery --> analytical ETL --> test (ADW sourced from ABTs) --> prod (ADW)
It is human nature to fall back on what you are comfortable with or fully understand. Standard ETL tools based on SQL are good for moving around data (that already exists) and transforming it into different forms like a star schema (designed to optimize certain BI reporting queries). However, SQL alone does not render new insights that help "predict" what will happen or help drive more profit for an organization.
Metaphorically speaking, you can build a deck with nails (BI) and a hammer (standard ETL) is the right tool to use. If you want a better, stronger, longer-lasting deck then you should use screws (BA). However, if IT still wants to use a hammer (std ETL), it might be a problem. You actually can choose to use a hammer with screws, but why would you want too?
The concept described above is just one reason why SAS placed so high in the most recent report, The Forrester Wave™: Agile Business Intelligence Platforms, Q3 2014. Download it now to learn more.