What is a CAS-enabled procedure?

5

I attended a seminar last week whose purpose was to inform SAS 9 programmers about SAS Viya. I could tell from the programmer's questions that some programmers were confused about three basic topics:

  • What are the computing environments in Viya, and how should a programmer think about them?
  • What procedures do you get when you order a programming-oriented Viya product such as SAS Visual Statistics or SAS Econometrics? Of these procedures, which are CAS-enabled?
  • If you have legacy SAS programs, can you still run them if your company migrates from SAS 9 to SAS Viya?

I am a programmer, so I thought it might be helpful for me to discuss these topics programmer-to-programmer. In a series of articles, I am going to discuss issues that a SAS statistical programmer might face when migrating to Viya from SAS 9. I use the term "SAS 9" to refer to the SAS Workspace Server that runs procedures in traditional products such as Base SAS, SAS/STAT, and SAS/ETS. So "SAS 9" refers to the most recent version of the "classic" SAS programming environment. It is the version of SAS that existed before SAS Viya was created.

Clients and servers: Where do computations occur in SAS Viya?

In SAS 9, a procedure runs on the SAS Workspace Server. In SAS 9, the word "client," refers to a program such as Enterprise Guide (EG) or SAS Studio, which runs on a PC and submits code to the SAS Workspace Server. The server computes the results and sends tables and graphs back to the client, which displays them. Typically, the input and output data sets remain on the server.

You can think of SAS Viya as having two main components: the CAS server where the data are stored and the computations are performed, and support for several client languages. A client language enables you to connect to the CAS server and tell it what analyses you want to perform. So, in the world of Viya, "client" no longer refers to a GUI like EG, but to an entire programming environment such as SAS, Python, or R. The purpose of the client software is to connect to CAS, submit actions, and get back results. You then use the capabilities of the client language to display the results as a table or graph. For example, the SAS client uses ODS to display tables and graphs. In Python, you might use matplotlib to graph the results. In R, you might use ggplot. In all cases, you can also use the native capabilities of the client language (DATA step, Pandas, the tidyverse,....) to modify, aggregate, or enhance the output.

I use the SAS client to connect to and communicate with the CAS server. By using a SAS client to communicate with CAS, I can leverage my 25 years of SAS programming knowledge and skills. Others have written about how to use other clients (such as a Python client) to connect to CAS and to call CAS actions.

What procedures do you get when you order a programming-oriented Viya product?

When you purchase a product in SAS Viya, you get three kinds of computational capabilities:

  • Actions, which run on the CAS server. You can call an action from any client language.
  • CAS-enabled procedures, which are parsed on the SAS client but call CAS actions "under the covers."
  • Legacy SAS procedures that run on the SAS client, just as they do in SAS 9.

Obviously, the CAS-enabled and legacy procedures are only available on the SAS client.

To give an example, SAS Visual Statistic contains action sets (which contain actions), CAS-enabled procedures, and all the procedures in SAS/STAT. All procedures run on the SAS compute server, which is also called the SAS client. (The SAS compute server was formerly known as the SAS programming runtime environment, or SPRE.) However, the CAS-enabled procedures call one or more actions that run on the CAS server, then display the results as ODS tables and graphics.

A CAS-enabled procedure performs very few computations on the client. In contrast, a legacy procedure that is not CAS-enabled performs all of its computations on the SAS client. It does not call any CAS actions. An example of a CAS-enabled procedure is the REGSELECT procedure, which performs linear regression with feature selection. It contains many of the features of the GLMSELECT procedure, which is a traditional regression procedure in SAS/STAT.

What are some CAS-enabled procedure?

The following links are helpful for discovering the names and functionality of CAS-enabled procedures:

Can you run legacy programs in SAS Viya?

Naturally, SAS 9 statistical programmers want to make sure that their existing programs will run in Viya. That is why SAS Visual Statistics comes with the legacy SAS/STAT procedures. The same is true for SAS/ETS proceduires, which are shipped as part of SAS Econometrics. And the SAS IML product in Viya contains PROC IML, which runs on the SAS client, as well as the newer iml action, which runs in CAS.

So what happens if, for example, you call PROC REG in SAS and ask it to perform a regression on a SAS data set in the WORK libref? PROC REG will do what it has always done. It will run in the SAS environment. It will not run on the CAS server. It will not magically run faster than it used to in SAS 9. The performance of most legacy programs should be comparable to their performance in SAS 9.

There are some exceptions to that rule. Some SAS procedures have been enhanced and now perform better than their SAS 9 counterparts. For example, the SAS IML team has enhanced certain functions in PROC IML in SAS Viya so that they have better performance in SAS Viya than the SAS 9 version of the procedure. The SAS IML development team is focused exclusively on improving performance and adding features in SAS Viya, both PROC IML and the iml action.

Another exception is that some Base SAS procedures were enhanced so that they behave differently depending on the location of the data. Many Base SAS procedures are now hybrid procedures. If you tell them to analyze a CAS table, they will call an action, which runs in CAS, and retrieve the results. If you tell them to analyze a SAS data set, they will run on the SAS client, just like they do in SAS 9. For example, PROC MEANS will call the aggregation.aggregate action to compute descriptive statistics on variables in a CAS table.

To make the situation more complicated, some of the legacy Base SAS procedures support features that are not supported in CAS. When you request an option that is not supported in CAS, the procedure will download the data from CAS into SAS and perform the computation on the client. This can be inefficient, so check the documentation before you start using legacy procedures to analyze CAS tables. As a rule, I prefer to use legacy procedures to analyze SAS data sets on the client; I use newer CAS-enabled procedures for analyzing CAS tables.

Summary

At a recent seminar for SAS 9 programmers, there were lots of questions about SAS Viya and what it means to for a SAS programmer to start programming in Viya. This article is the first of several articles that I intend to write for SAS programmers. I don't know everything, but I hope that other SAS programmers will join me in sharing what they have learned about the process of migrating from SAS 9 to SAS Viya.

If you are a SAS programmer who has general questions about SAS Viya, let me know what you are thinking by leaving a comment. I might not know the answer, but I'll try to find someone who does. For programming questions ("How do I do XYZ in Viya?"), post your question and sample data to the SAS Support Communities. There is a dedicated community for SAS Viya questions, so take advantage of the experts there.

In a subsequent article, I will discuss the difference between a "caslib" and a "libref" in SAS Viya.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

5 Comments

  1. Pingback: Caslibs and librefs in SAS Viya - The DO Loop

  2. Hi Rick. 18 months ago we switched to Viya from SAS 9.
    At the beginning it was difficult to understand how CAS works. The learning curve was really steep.
    The cas actions felt unnatural in comparison to the sas coding in SAS9.
    Now I'm a big fan of the cas environment. It runs super fast and many cas actions let you save data steps in between.
    I can only encourage SAS users to adopt cas as their programming environment.
    I remember the following issues I faced when making the transition.
    New way of sorting behaviour in SAS
    Difference between tables and files and how to play with files as a safeguard for your reports in Visual Analytics
    Conflicting Naming conventions in cas when importing files through the manage data option (32 length issue)

    These challenges are overcome and as I mentioned please new users to Viya take the following advice.
    Do not invest too much time in rewriting your SAS9 code to CAS. Do it the other way round. Replace old proc procedures and client data steps by their siblings in cas and caslib data steps.

    • Rick Wicklin

      Thanks for sharing your experiences. Yes, I echo your advice. It is too daunting to attempt to rewrite 10 or 20 years of legacy code. Instead, start by doing two things:
      1. Figure out which analyses are taking a long time and investigate converting those analyses to CAS.
      2. Consider writing new analyses in CAS, when appropriate.

  3. I do only use SAS for data processing (importing, exporting, textmining etc.). What advantage would I have with Viya compared with SAS 9. A typical scenario is a stock of 40,000 json files (590GB) downloaded from a server as input, resulting in a SAS dataset of 60GB (CHAR-compressed) with 130 million obs and 32 vars. The import on my i9-10980 takes about 3-4 days and subsequent datasteps take around another 2 full days. Moving such data across a network is not impossible but where would such data best be stored since every datastep results in a modified dataset?

    • Rick Wicklin

      You have asked excellent questions.

      In general, the most efficient plan would read the data once into CAS tables. Viya can read, write, and process data in a distributed fashion, which means that moving to a server that has multiple nodes (or at least multiple cores =(CPUs)) will enable the data to be read in parallel. When you use k cores, the data can be read much faster (up to k times faster), which reduces the time required to read the data. Then (more importantly), you can use CAS-enabled procedures to analyze the data in parallel. In general, analyzing 132M observations and 32 variables should not be a big challenge, although I don't know the details of your process.

      To explicitly answer your question, your workflow seems like a good scenario for parallel processing on CAS tables. SAS has videos and web pages where you can learn more about SAS Viya and parallel processing of data. However, I am an analyst/statistician, so my experience with large data-processing projects is limited. To talk to experts, I suggest you start a discussion on the SAS Viya forum at the SAS Support Communities. There are many experts there that have experience with projects like this.

Leave A Reply

Back to Top