PROC SQL
PROC SQL is a very powerful ANSI 92 compliant version of SQL that also allows us to leverage many unique SAS capabilities. Recently I was asked if the PROC SQL in Figure 1 could be refactored into PROC FedSQL so it could run faster by leveraging SAS Viya’s in-memory engine CAS (SAS® Cloud Analytic Services). I was struggling to find a way to refactor this into PROC FedSQL, so I reached out to the SAS Jedi (aka Mark Jordan) for help.
/* Original SQL Statements */ proc sql; create table BenchMark as select count(*) as ItemCount , sum( abs( nhits - nruns ) < 0.1*natbat ) as DIFF_10 from sashelp.baseball; run; |
Figure 1. Original PROC SQL
In Figure 2, we can review the SAS Log of our PROC SQL code.
- It is line 77 that we want to refactor into PROC FedSQL so we can benefit performance improvements by running that code distributed in CAS.
- On line 77, we use the alias DIFF_10 to create the new column name that is calculated by the two SAS functions SUM and ABS.
- The expression on line 77 will cause SQL to return a value of 1 if the condition is true and a value of 0 if the condition is false.
- The alias DIFF_10 will contain the summation of the value returned by the condition (i.e. 0 or 1) for all rows in our data set SASHELP.BASESBALL.
In Figure 5, we can review the results of our PROC SQL statement.
Figure 2. SAS Log of PROC SQL
PROC FedSQL
PROC FedSQL is ANSI 99 compliant without any of the unique SAS capabilities that PROC SQL contains, but PROC FedSQL is CAS enabled, which allows us to leverage SAS Viya’s distributed computing capabilities to improve run-times. Figure 3 is the refactored PROC FedSQL code that the SAS Jedi came up with.
/* PROC FedSQL code */ proc fedsql sessref=casauto; create table BenchMark as select count(*) as ItemCount , sum(case when (abs (nhits - nruns ) < (0.1*natbat) ) is true then 1 end ) as DIFF_10 from baseball; quit; |
Figure 3. CAS-enabled PROC FedSQL
Figure 4 contains the SAS Log of our CAS enabled PROC FedSQL.
- Notice on lines 77 we added a CASE statement to the SUM function for our alias DIFF_10.
- On lines 78-79, the WHEN statement return a value of 1 when the condition is true and a value of 0 when it is false.
- The alias DIFF_10 will contain the summation, of the value returned by the CASE statement (i.e. 0 or 1) for all rows in our CAS table BASESBALL.
In Figure 5 we can review the results of our PROC FedSQL statement.
Figure 4. SAS log of PROC FedSQL code
Figure 5. Validation that the values from the refactoring of PROC SQL into PROC FedSQL match
Conclusion
As we adopt SAS Viya, a goal is to leverage CAS to speed up the processing of routines written in PROC SQL. To accomplish this, refactor PROC SQL code into PROC FedSQL code. For PROC SQL that cannot be refactored, simply run that PROC SQL code as-is in SAS Viya’s SAS Programming Run-time Environment (SPRE).
2 Comments
Hi,
This is pretty late from the original post, but how can the FedSQL proc find the following:
from baseball;
Yvan
Hi Yvan,
Thank you for your comment, however, I do not understand your question. Please clarify.
Steve