Caslibs and librefs in SAS Viya

3

When SAS 9 programmers transition to SAS Viya, there are inevitably questions about how new concepts in Cloud Analytic Services (CAS) relate to similar concepts in SAS. This article discusses the question, "What is the difference between a libref and a caslib?" Both are used to access data, but they are used in different situations. Briefly, a caslib can be used by any language that calls an action (SAS, Python, Lua, R, and so forth), whereas librefs are used only by procedures in the SAS language.

This article also discusses how to determine the active caslib, how to load data into a caslib, and how to read data from a caslib by using a CAS-enabled procedure in SAS.

An overview of the CAS server and client languages

I use the term "SAS 9" to refer to the traditional SAS workspace server that runs procedures in products such as Base SAS, SAS/STAT, and SAS/ETS. So "SAS 9" refers to the "classic" SAS programming environment that existed before SAS Viya.

For an overview of the CAS server and client languages, see my article about CAS-enabled procedures. I use the SAS client (called the SAS compute server) to connect to the CAS server. By using a SAS client to communicate with CAS, I can leverage my 25 years of SAS programming knowledge and skills.

What is a libref?

SAS 9 programmers are used to working with librefs, which are defined by using the LIBNAME statement in SAS. In SAS 9, you use a libref to point to a data source, which can be a file directory (such as C:/MyData) or a database such as DB2, SQL Server, Oracle, and so forth. A libref enables you to read or write data by using a SAS procedure.

Some librefs are automatically available, such as WORK or SASUSER. Others might be pre-defined by your SAS administrator so that everyone on a team can access the same data. A SAS programmer can use the LIBNAME statement to create a personal librefs, which is a convenient way to organize data that belong to different projects.

The LIBNAME statement enables you to use a SAS "engine" to access data from a wide variety of data sources. An engine enables SAS procedures to process data from a non-SAS data format such as XML, JSON, Excel, or a database. This is one of the reasons companies invest in SAS: it enables programmers to concentrate on analyzing the data rather than worrying about how to read it.

Many SAS programmers are familiar with the idea of associating a libref with a database or schema. For example, to connect to an ORACLE database, you might run the following LIBNAME statement:

libname mydblib ORACLE user=wicklin;  /* librefs can connect to various data sources */

I mention this because there is a CAS LIBNAME engine that you can use to access CAS tables from SAS procedures in Viya. That means you can access CAS tables from any SAS procedure, just by assigning a libref. But what does the libref refer to? It refers to a caslib in a CAS session. Therefore, it is important to understand how to define and use a caslib, which you can think of as being analogous to a libref, except that a caslib exists on the CAS server, not on the SAS client. As you read through the next sections, you will see many similarities between caslibs and librefs.

What is a caslib?

Caslibs are discussed in the documentation SAS Cloud Analytic Services: Fundamentals. A caslib is associated with a data source and includes the connection information for the data source. It enables you to read and write in-memory tables from CAS. I like to think of it as a "server side" libref, except that you can use a caslib from any client language, not just from SAS.

What is the active caslib?

When you use the CAS statement to start a CAS session, CAS creates a personal caslib for you. The name of your personal caslib is usually CASUSER(userID), although you can usually abbreviate this as CASUSER. By default, this is the active caslib for the session. Here is a partial output of the log when I use the CAS statement to connect to a CAS server:

cas;                 /* connect to CAS session */
   NOTE: The session CASAUTO connected successfully to Cloud Analytic Services <name> using port 5570. <snip>
       The user is frwick and the active caslib is CASUSER(frwick).
Notice that when you connect to a CAS session, the log tells you the name of the active caslib. For me, the log states "the active caslib is CASUSER(frwick)." The log also tells me the session name, which is CASAUTO in this example.

The active caslib is the default location to read and write an in-memory data set. If you store data in the active caslib, you don't need to specify the name of the caslib when you call most actions. By default, the actions look in the active caslib to find the data.

What other caslibs are defined?

Your administrator might have given you access to other predefined caslibs. You can use the CASLIB statement to display all available caslibs in the log, as follows:

caslib _all_ list;   /* display all caslibs to log */
   NOTE: Session = CASAUTO Name = CASUSER(frwick)
         Type = PATH
         Description = Personal File System Caslib
         Path = /cas/data/caslibs/casuserlibraries/frwick/
         Definition = 
         Subdirs = Yes
         Local = No
         Active = Yes
         Personal = Yes
   NOTE: Session = CASAUTO Name = Public
         Type = PATH
         Description = Shared and writeable caslib, accessible to all users.
         Path = /cas/data/caslibs/public/
         Definition = 
         Subdirs = No
         Local = No
         Active = No
         Personal = No

The log contains a list of all caslibs in the current session. One of the caslibs has the text "Active = Yes" as part of its description. That caslib is the active caslib. In this example, the active caslib is CASUSER.

There are different types of caslibs, which provide access to various data sources, such as DB2, Hadoop, Oracle, and Teradata, just to name a few. A complete list of data sources is provided in the documentation. I don't have experience with most of these data sources. I primarily use "Type = PATH" caslibs, which can store audio files, video files, SAS data sets, CSV files, and more. I also primarily use in-memory tables, but for a PATH caslib the "Path" field tells you the location (directory) in which CAS will save the data upon request.

In addition to my personal caslib, I also have access to a global caslib named 'Public'. You can change the active caslib by using the CASLIB= suboption on the CASSESSOPTS= option on the CAS statement. For example, the following statements change the active caslib in the CASAUTO session to 'Public', then immediately changes it back to my personal caslib.

cas CASAUTO sessopts=(caslib='public');    /* make the 'public' caslib active */
cas CASAUTO sessopts=(caslib='casuser');   /* make my personal caslib active */
   NOTE: 'Public' is now the active caslib.
   NOTE: 'CASUSER(frwick)' is now the active caslib.

How does a libref differ from a caslib?

  • A libref and a caslib both point to a data source.
  • The default libref is typically WORK or SASUSER. The analogous concept is the "active" caslib. By default, the active caslib is CASUSER, which is your personal caslib.
  • A libref is defined in that SAS language and is used only by SAS procedures. A caslib exists on the CAS server and can be used by any client language that can access CAS.
  • SAS 9 programmers often use the LIBNAME statements to define new librefs. In CAS, only authorized users can define a caslib (by using the CASLIB statement).
  • If you want to read a CAS table from a SAS procedure, you need to define a libref that points to a caslib. This is shown later in this article.

Load data to a caslib

You can upload data from the SAS client into a caslib by using the CASUTIL procedure. (You can also use the table.loadTable action.) The following statements upload the data in Sashelp.Cars (which is a SAS data set) to the active caslib in the current CAS session. Optionally, you can use the LIST statement to list the tables in that caslib:

proc casutil;
   load data=Sashelp.Cars casout='Cars' replace; /* use active caslib */
   list tables;                                  /* show tables in active caslib */
quit;

As mentioned earlier, when you interact with the active caslib, you don't need to explicitly specify it in most actions and procedures. If you want to upload the data to a different caslib, you can use the OUTCASLIB= option on the LOAD statement.

Read a CAS table by using a libref

If you want to use a SAS procedure to read data from a CAS in-memory table, then you need to define a libref by using the CAS LIBNAME engine. For example, the following statement defines a libref that refers to the active caslib:

libname myLib cas;     /* refers to active caslib, whatever it is */

Alternatively, you can use the CASLIB= option to specify a caslib:

libname mycasu cas caslib=casuser; /* refers to CASUSER, even if not active */

The mycasu libref refers to the CASUSER personal caslib, regardless of whether it is the active caslib.

You can use a libref to download or upload a CAS table by using the SAS DATA step. For example, the following DATA step is an alternative way to upload data from Sashelp.Cars into a table in the active caslib:

data myLib.Cars;       /* upload to active caslib */
   set Sashelp.Cars;   /* these data are on the SAS client */
run;

You can use the libref to read data into any SAS procedure. For example, the following statement computes basic statistics for some of the variables in the CAS in-memory data:

proc means data=myLib.Cars;       /* use libref to read CAS table */
   var EngineSize MPG_City Weight;
run;

PROC MEANS is a CAS-enabled procedure, so it calls an action to perform the computations for data that are in a CAS table. The log tells you that it calls the aggregate action:

NOTE: The CAS aggregation.aggregate action will be used to perform the initial summarization.

Not every CAS-enabled procedure tells you the name of the action that was called. Sometimes the log will only notify you that CAS was involved by displaying a note such as the following:

NOTE: The Cloud Analytic Services server processed the request in 0.0097 seconds.

Summary

In summary, this article discusses the concept of a caslib, including how to learn the names of the defined caslibs and the active caslib. Caslibs are used by all actions and procedures that interact with data table on CAS server. In contrast, a libref is the familiar SAS-only reference, which you can use in SAS procedures to read data from a variety of data sources, including a CAS table.

References

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

3 Comments

  1. Pingback: What is a CAS-enabled procedure? - The DO Loop

  2. Pingback: How to view CAS tables in SASĀ® Studio - SAS Users

  3. Pingback: Read a CAS data table by using the iml action - The DO Loop

Leave A Reply

Back to Top