Welcome to the fourth installment in my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS server, how to execute CAS actions, and how to work with the results. Now it's time to understand how your data is organized on the CAS server. To understand how the data is catalogued you must understand caslibs.
So the big question is, "What exactly is a caslib?"
Let's start with an image as an introduction to caslibs.
A caslib contains connection information about a data source and an in-memory space to process data. A caslib also contains session information and a variety of access controls for users and scope. In this post I'll focus on exploring the in-memory and data source portion of a caslib.
- The data source portion holds information about the caslib such as the path that holds a variety of data files like xlsx, sas7bdat, csv, txt, or sashdat. It can also represent a database connection.
- The in-memory portion contains data source files loaded into memory and available for distributed processing as CAS tables.
For detailed information about caslibs, visit the SAS documentation.
Let's look at an example. You can follow along by logging into SAS Viya and making a connection to the CAS server with your Python client. I will be using SAS Viya for Learners (using the Jupyter Notebook option), and have already made my connection to CAS and named my connection conn. For more information about making a connection to CAS, visit Part 1 of the series.
After making a connection to CAS, use the caslibInfo CAS action to view the available caslibs.
The result of the caslibInfo action returns a table, as seen above. The table includes the name of each caslib, it's type, a description, and the path. The table also includes a caslib metadata:
- The Local column indicates the scope of the caslib. A 0 represents a global caslib, and a 1 is local.
- The Active column indicates if the caslib is active. A 1 represents an active caslib. If a CAS action has no caslib specified, the active caslib is used.
- The Personal column indicates if the caslib is only available to you. By default, all users get a casuser caslib, only accessible by that user.
View Available Data Source Files in a Caslib
Next, let's view the data source files in the casuser caslib. Data source files are physical files available to the CAS server. These are known as server-side files since they are associated with a caslib.
To view the available data source files, use the fileinfo action with the caslib parameter. Note, if you do not use the caslib parameter, the fileinfo action uses the active caslib.
The results of the fileInfo action show that my casuser caslib's data source portion contains csv, txt and sas7bdat files.
View Available In-Memory Tables in a Caslib
Now, let's see if any tables are loaded into memory. To view available in-memory tables, use the tableinfo action with the caslib parameter.
The results of the tableInfo action show one table named CARS loaded into memory. Once a table is loaded into the in-memory portion, you can begin processing the data. Here I'll make a reference the CARS CAS table using the CASTable method. Then I'll use the head method to view the first 5 rows of the table.
cars=conn.CASTable("CARS", caslib="casuser") cars.head()
The results of the head method show the first 5 rows of the CARS table.
In conclusion, the most important concept to understand about caslibs are the two main areas: data source and in-memory. The former contains connection information to a data source and the latter refers to CAS tables available for processing. Using the actions above, you can easily explore the data in your environment.
In Part 5 of the series, we'll look at loading data into memory. Stay tuned.