Are you a statistical programmer whose company has adopted SAS Viya? If so, you probably know that the DATA step can run in parallel in SAS Cloud Analytic Services (CAS). As Sekosky (2017) says, "running in a single thread in SAS is different from running in many threads in CAS." He goes on to state, "you cannot take any DATA step, change the librefs used, and have it run correctly in parallel. You ... have to know what your program is doing to make sure you know what it does when it runs in parallel."
This article discusses one aspect of "know what your program is doing." Specifically, to run in parallel, the DATA step must use only functions and statements that are "CAS-enabled." Most DATA step functions run in CAS. However, there is a set of "SAS-only" functions that do not run in CAS. This article discusses these functions and provides a link to a list of the SAS-only functions. It also shows how you can get SAS to tell you whether your DATA step contains a SAS-only function.
DATA steps that run in CAS
By default, the DATA step will attempt to run in parallel when the step satisfies three conditions (Bultman and Secosky, 2018, p. 2):
- All librefs in the step are CAS engine librefs to the same CAS session.
- All statements in the step are supported by the CAS DATA step.
- All functions, CALL routines, formats, and informats in the step are available in CAS.
The present article is concerned with the third condition. How can you know in advance whether all functions and CALL routines are available in CAS?
A list of DATA step functions that do not run in CAS
For every DATA step function, the SAS Viya documentation indicates whether the function is supported on CAS or not. For example, the following screenshots of the Viya 3.5 documentation show the documentation for the RANUNI function and the RAND function:
Notice that the documentation for the RANUNI function says, "not supported in a DATA step that runs in CAS," whereas the RAND function is in the "CAS category," which means that it is supported in CAS. This means that if you use the RANUNI function in a DATA step, the DATA step will not run in CAS. (Similarly, for the other old-style random number functions, such as RANNOR.) Instead, it will try to run in SAS. This could result in copying input data from CAS, running the program in a single thread, and copying the final data set into a CAS table. Copying all that data is not efficient.
Fortunately, you do not need to look up every function to determine if it is a CAS-enabled or SAS-only function. The documentation now includes a list, by category, of the Base SAS functions (and CALL routines) that do not run in CAS. The following screenshot shows the top of the documentation page.
Can you always rewrite a DATA step to run in CAS?
For the example in the previous section (the RANUNI function), there is a newer function (the RAND function) that has the same functionality and is supported in CAS. Thus, if you have an old SAS program that uses the RANUNI function, you can replace that call with RAND("UNIFORM") and the modified DATA step will run in CAS. Unfortunately, not all functions have a CAS-enabled replacement. There are about 200 Base SAS functions that do not run in CAS, and most of them cannot be replaced by an equivalent function that runs in CAS.
The curious reader might wonder why certain classes of functions cannot run in CAS. Here are a few sets of functions that do not run in CAS, along with a few reasons:
- Functions specific to the English language, such as the SOUNDEX and SPEDIS functions. Also, functions that are specific to single-byte character sets (especially I18N Level 0). Most of these functions are not applicable to an international audience that uses UTF-8 encoding.
- Functions and statements for reading and writing text files. For example, INFILE, INPUT, and FOPEN/FCLOSE. There are other ways to import text files into CAS.
- Macro-related functions such as SYMPUT and SYMGET. Remember: There are no macro variables in CAS! The macro pre-processor is a SAS-specific feature, and one of the principles of SAS Viya is that programmers who use other languages (such as Python or Lua) should have equal access to the functionality in Viya.
- Old-style functions for generating random numbers from probability distributions. Use the RAND function instead.
- Functions that rely on a single-threaded execution on a data set that has ordered rows. Examples include DIF, LAG, and some permutation/combination functions such as ALLCOMB. Remember: A CAS data table does not have an inherent order.
- The US-centric state and ZIP code functions.
- Functions for working with Git repositories.
How to force the DATA step to stop if it is not able to run in CAS
When a DATA step runs in CAS, you will see a note in the log that says:
NOTE: Running DATA step in Cloud Analytic Services.
If the DATA step runs in SAS, no note is displayed. Suppose that you intend for a DATA step to run in CAS but you make a mistake and include a SAS-only function. What happens? The default behavior is to (silently) run the DATA step in SAS and then copy the (possibly huge) data into a CAS table. As discussed previously, this is not efficient.
You might want to tell the DATA step that it should run in CAS or else report an error. You can use the SESSREF= option to specify that the DATA step must run in a CAS session. For example, if MySess is the name of your CAS session, you can submit the following DATA step:
/* use the SESSREF= option to force the DATA step to report an ERROR if it cannot run in CAS */ data MyCASLib.Want / sessref=MySess; x = ranuni(1234); /* this function is not supported in the CAS DATA step */ run; |
NOTE: Running DATA step in Cloud Analytic Services. ERROR: The function RANUNI is unknown, or cannot be accessed. ERROR: The action stopped due to errors. |
The log is shown. The NOTE says that the step was submitted to a CAS session. The first ERROR message reports that your program contains a function that is not available. The second ERROR message reports that the DATA step stopped running. A DATA step that runs on CAS calls the dataStep.runCode action, which is why the message says "the action stopped."
This is a useful trick. The SESSREF= option forces the DATA step to run in CAS. If it cannot, it tells you which function is not CAS-enabled.
Other ways to monitor where the DATA steps runs
The DATA step documentation in SAS Viya contains more information about how to control and monitor where the DATA step runs. In particular, it discusses how to use the MSGLEVEL=I system option to get detailed information about where a DATA step ran and in how many threads. The documentation also includes additional examples and best practices for running the DATA step in CAS. I recommend the SAS Cloud Analytic Services: DATA Step Programming documentation as the first step towards learning the advantages and potential pitfalls of running a DATA step in CAS.
Summary
The main purpose of this article is to provide a list of Base SAS functions (and CALL routines) that do not run in CAS. If you include one of these functions in a DATA step, the DATA step cannot run in CAS. This can be inefficient. You can use the SESSREF= option on the DATA statement to force the DATA step to run in CAS. If it cannot run in CAS, it stops with an error and informs you which function is not supported in CAS.
3 Comments
Thank you for sharing
Very nice. Is there a way to let it run and view the log to see what specifically ran in CAS and what didn't to help understand what you need to adjust?
I believe I answer this question in the section, "How to force the DATA step to stop if it is not able to run in CAS." Note that either the entire step runs in CAS or it doesn't. Contrary to your question, you cannot have a program for which some parts run in CAS and other parts do not.