Just because you are using CAS actions doesn't mean you can forget about the powerful SAS DATA step. The dataStep.runCode CAS action is here!
Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page.
In this example, I will use the CAS procedure to execute the dataStep.runCode CAS action. Be aware, instead of using the CAS procedure, I could execute the action with Python, R, or even a REST API with some slight changes to the syntax for the specific language.
Why use the DATA Step?
It's pretty simple, the DATA step is a powerful way to process your data. It gives you full control of each row and column, ability to easily create multiple output tables, and provides a variety of statements to pretty much do anything you need.
In this example, I will use the DATA step to quickly create three CAS tables based on the value of a column. Before we execute the DATA step, let's view the frequency values of the Origin column in the cars table. To do that, I'll use the simple.freq action.
proc cas; simple.freq / table={name='cars', caslib='casuser'}, input='Origin'; quit; |
The result of the freq action shows that the Origin column in the cars CAS table has three distinct values: Asia, Europe and USA. I can use that information to create three CAS tables based off these unique values using the SAS DATA step.
Execute DATA Step in SAS Viya's CAS Server
One way to execute the DATA step directly in CAS is to use the runCode action with the code parameter. In the code parameter just specify the DATA step as a string. That's it!
In this example, I'll add the DATA step within a SOURCE block. The SOURCE block stores the code as variable. The DATA step code is stored in the variable originTables. This DATA step will create three CAS tables, one table for each unique value of the Origin column in the cars table.
proc cas; source originTables; data casuser.Asia casuser.Europe casuser.USA; set casuser.cars; if Origin='Asia' then output casuser.Asia; else if Origin='Europe' then output casuser.Europe; else if Origin='USA' then output casuser.USA; run; endsource; dataStep.runCode / code=originTables; quit; |
The runCode action executes the DATA step in the distributed CAS environment and returns information about the input and output tables. Notice three CAS tables were created: Asia, Europe and USA.
DATA Step in CAS has Limitations
Now, one thing to be aware of is not all functionality of the DATA step is available in CAS. If you are using the runCode action with an unsupported statement or function in CAS, you will receive an error. Let's look at an example using the first function, which gets the first letter of a string, and is not supported in CAS.
proc cas; source originTables; data casuser.bad; set casuser.cars; NewCol=first(Model); run; endsource; dataStep.runCode / code=originTables; quit; |
The results of the runCode action return an error. The error occurs because the FIRST function is unknown or cannot be accessed. In situations like this you will need to find a CAS supported method to complete the task. (HINT: Here instead of the first function you can use the substr function).
For more information visit Restrictions and Supported Language Elements. Be sure to find the version of your SAS Viya environment.
Summary
In SAS Viya, the runCode action provides an easy way to execute most of the traditional DATA step in CAS in any language, from the CAS Language (CASL), to Python, R, Lua, Java and more.
Additional Resources
runCode Action
DATA Step Action Set: Details
Restrictions and Supported Language Elements
SOURCE statement
SAS® Cloud Analytic Services: Fundamentals
Code