If you're a SAS programmer who now uses SAS Viya and CAS, it's worth your time to optimize your existing programs to take advantage of the new environment. This post is a continuation of my SAS Global Forum 2020 paper Best Practices for Converting SAS® Code to Leverage SAS® Cloud Analytic Services and my SGF 2020 Super Demo.
The best approach for refactoring SAS code for SAS Viya has a few steps:
- First, "lift and shift" your existing code to run successfully in the compute server for SAS Viya.
- Next, create CASLIB statements to all of your data sources: i.e. sas7bdat, CSV files, parquet files, relational databases, cloud data sources, etc.
- Finally, identify the longest running steps so you know where you have the biggest opportunities. For example, look at steps where the "real time" is 30 minutes or longer, as well as steps that are CPU bound. CPU-bound steps are steps where the CPU time is equal to or greater than the real time for that step.
To help us identify those steps we can leverage a new utility to analyze SAS logs and create reports to help us understand the Real Time and CPU Time for each step. Read on to learn more about this final step in the code refactoring process.
Utility: SAS Log Parser
To generate these reports, I created a SAS program that will read all SAS log files in a directory and create one report per SAS log as well as a descending Real Time (Clock Time) and CPU Time reports. Figure 1 is an example of the report that is generated for each SAS log. In this report we see each step’s procedure or DATA Step’s Real Time and CPU Time. It's derived by picking up on SAS log entries like this:
NOTE: PROCEDURE SGPLOT used (Total process time): real time 2.79 seconds cpu time 0.08 seconds NOTE: The SAS System used: real time 1:08.86 cpu time 1:18.18
Note, the Total Time and Total CPU Time are fields that are populated when the SAS log note “NOTE: The SAS System used:” is encountered. SAS programs that are ran in batch or using an RSUBMIT process via MP CONNECT code will generate this note. MP CONNECT or SAS Grid Computing enables you to execute RSUBMIT statements asynchronously and requires SAS/CONNECT.
Descending Real Time Report
Figure 2 contains an example of the descending real time report. In this report we observe in the Step column that the longest running step is a PROC LOGISTIC that takes over 14 hours (Real Time column) and from the SAS log called Sample3.log (File Name column). The best way to use this report is to focus on steps that take longer than 30 minutes. In our case we have 9 steps from 3 SAS logs. Now that we know that we can review the details of each step and then benchmark if that step would run faster by leveraging SAS® Cloud Analytic Services (CAS). Note, for CAS to process data, all data must be in CAS tables and the step must be coded using CAS-enabled steps.
Descending CPU Time Report
Figure 3 contains an example of the descending CPU times report where we observe that the most CPU intensive step takes over 13 hours (CPU Time column) and is from the Samples3.log (File Name column). Note, if you review the Real Time and CPU Time columns you should notice that only observation 11 (Obs column) has a CPU Time that exceeds the Real Time making it CPU bound. However, we would not focus on this step since the Real Time is less than 30 minutes.
Source code for the SAS Log Parser
I've bundled my SAS code for these steps in my GitHub repository for SAS Global Forum 2020. You'll find these programs along with the other code that supports my previous topics of adapting SAS 9 code for SAS Viya.
sasLogParserMacros.sas contains macros that drive the process. The macro program %LIST_FILES lists all files with a given extension, %CHECK checks for the existence of a file and deletes it if found, and %SASLOG parses a SAS log and provides the values found in the reports. When you save this file ensure you name it sasLogParserMacros.sas and in the same directory that you save sasLogParser.sas.
sasLogParser.sas is the program we submit to produce the reports. This code includes the code sasLogParserMacros.sas and then generate the reports. The only two statements we need to modify are the first two %LET statements in this program. The first %LET statement points to the location of the two SAS programs sasLogParserMacros.sas and sasLogParser.sas. The second %LET statement points to the directory containing all the SAS logs we want to parse. Note, outside of the two %let statements in sasLogParser.sas do not change any other statement in either program.
Updates to the utility as of 19Feb2021
Based upon customer feedback I have modified the utility in the following ways:
1. The utility will now parse the parent directory and all of its sub-directories for files with the extension of “.log”.
2. All reports are written to a “reports” directory that is located in the directory that you installed the utility to.
3. A new report call “3.StepsFrequency.pdf” which is a frequency report based on the steps found in the log files.
4. A new report call “4.totalseconds.xlsx” which converts all of the time variables found in the "1.descendingRealTime.pdf" report into total seconds.
5. The following spreadsheets “1.descendingRealTime.xlsx”, “2.descendingCPUTime.xlsx” and “3.StepsFrequency.xlsx”. These spreadsheets can be found in the "reports" directory.
6. All reports are based on the SAS data set "logs.sas7bdat" which is located in the "reports" directory.
Updates to the utility as of 30Apr2021
Based upon customer feedback to speed up this utility I have modified the utility at the cost of reducing the number of reports generated by the utility. For the "turbo charged" version download these two file from my GitHub repository:
sasLogParserMacrosTurboCharged.sas. When you save this program to disk, ensure you name it sasLogParserMacros.sas. The reason for this is the program sasLogParserTurboCharged.sas includes this file using the name sasLogParserMacros.sas.
In order to understand which steps are good candidates for leveraging the in-memory engine CAS, we must first understand the real time and CPU time of each step. Then we can benchmark which engine in SAS Viya is appropriate for that step -- the compute server or CAS. The code that I've shared can run within SAS 9 or SAS Viya on the Windows or Linux platforms.