CAS-Action! Saving Frequency Tables - Part 2

0

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. In my previous post CAS-Action! Simple Frequency Tables - Part 1, I reviewed how to use the simple.freq CAS action to generate frequency distributions for one or more columns using the distributed CAS server. In this post I will show you how to save the results of the freq action as a SAS data set or a distributed CAS table.

In this example, I will use the CAS language (CASL) to execute the freq CAS action. Be aware, instead of using CASL, I could execute the same action with Python, R and more with some slight changes to the syntax for the specific language. Refer to the documentation for syntax in other languages.

Load the demonstration data into memory

I'll start by executing the loadTable action to load the WARRANTY_CLAIMS_0117.sashdat file from the Samples caslib into memory. By default the Samples caslib should be available in your SAS Viya environment. I'll load the table to the Casuser caslib and then I'll clean up the CAS table by renaming and dropping columns to make the table easier to use. For more information how to rename columns check out my previous post. Lastly I'll execute the fetch action to preview 5 rows.

proc cas;
   * Specify the input/output CAS table *;
   casTbl = {name = "WARRANTY_CLAIMS", caslib = "casuser"};
 
   * Load the CAS table into memory *;
    table.loadtable / 
        path = "WARRANTY_CLAIMS_0117.sashdat", caslib = "samples",
        casOut = casTbl + {replace=TRUE};
 
* Rename columns with the labels. Spaces replaced with underscores *;
 
   *Store the results of the columnInfo action in a dictionary *;
   table.columnInfo result=cr / table = casTbl;
 
   * Loop over the columnInfo result table and create a list of dictionaries *;
   listElementCounter = 0;
   do columnMetadata over cr.ColumnInfo;
	listElementCounter = listElementCounter + 1;
	convertColLabel = tranwrd(columnMetadata['Label'],' ','_');
	renameColumns[listElementCounter] = {name = columnMetadata['Column'], rename = convertColLabel, label=""};
   end;
 
   * Rename columns *;
   keepColumns = {'Campaign_Type', 'Platform','Trim_Level','Make','Model_Year','Engine_Model',
                  'Vehicle_Assembly_Plant','Claim_Repair_Start_Date', 'Claim_Repair_End_Date'};
   table.alterTable / 
	name = casTbl['Name'], caslib = casTbl['caslib'], 
	columns=renameColumns,
	keep = keepColumns;
 
   * Preview CAS table *;
   table.fetch / table = casTbl, to = 5;
quit;

The results above show a preview of the warranty_claims CAS table.

One Way Frequency for Multiple Columns

Next, I'll execute the freq action to generate a frequency distribution for multiple columns.

proc cas;
   casTbl = {name = "WARRANTY_CLAIMS", caslib = "casuser"};
   colNames = {'Model_Year', 
               'Vehicle_Assembly_Plant', 
	       {name = 'Claim_Repair_Start_Date', format = 'yyq.'}
   };
   simple.freq / table= casTbl, inputs = colNames;
quit;

The freq CAS action returns the frequency distribution of each column in a single result. While this is great,  what if you want to create a visualization with the data? Or continue processing the summarized data? How do you save this as a table? Well, you have a few options.

Save the results as a SAS data set

First, you can save the results of a CAS action as a SAS data set. The idea here is the CAS action will process the data in the distributed CAS server, and then the CAS server returns smaller, summarized results to the client (SAS Studio). The summarized results can then be saved as a SAS data set.

To save the results of a CAS action simply add the result option after the action with a variable name. The results of an action return a dictionary to the client and store it in the specified variable. For example, to save the results of the freq action as a SAS data set complete the following steps:

  1. Execute the same CASL code from above, but this time specify the result option with a variable name to store the results of the freq action. Here i'll save the results in the variable freq_cr.
  2. Use the DESCRIBE statement to view the structure and data type of the CASL variable freq_cr in the log (not required).
  3. Use the SAVERESULT statement to save the CAS action result table from the dictionary freq_cr as a SAS data set named warranty_freq. To do this specify the key Frequency that is stored in the dictionary freq_cr to obtain the result table.
proc cas;
   * Reference the CAS table *;
   casTbl = {name = "WARRANTY_CLAIMS", caslib = "casuser"};
 
   * Specify the columns to analyze *;
   colNames = {'Model_Year', 
               'Vehicle_Assembly_Plant', 
               {name = 'Claim_Repair_Start_Date', format = 'yyq.'}
   };
   * 1. Analyze the CAS table and store the results *;
   simple.freq result = freq_cr / table= casTbl, inputs = colNames;
 
   * 2. View the dictionary in the log *;
   describe freq_cr;
 
  * 3. Save the result table as a SAS data set *;
   saveresult freq_cr['Frequency'] dataout=work.warranty_freq;
quit;

SAS Log

In the log, the results of the DESCRIBE statement shows the variable freq_cr is a dictionary with one entry. It contains the key Frequency and the value is a result table. The table contains 22 rows and 6 columns. The NOTE in the log shows the SAVERESULT statement saved the result table from the dictionary as a SAS data set named warranty_freq in the work library.

Once the summarized results are stored in a SAS library, use your traditional SAS programming knowledge to process the SAS table. For example, now I can visualize the summarized data using the SGPLOT procedure.

* Plot the SAS data set *;
title justify=left height=16pt "Total Warranty Claims by Year";
proc sgplot data=work.warranty_freq noborder;
	where Column = 'Model_Year';
	vbar Charvar / 
		response = Frequency
		nooutline;
	xaxis display=(nolabel);
	label Frequency = 'Total Claims';
	format Frequency comma16.;
quit;

Save the Results as a CAS Table

Instead of saving the summarized results as a SAS data set, you can create a new CAS table on the CAS server. To do that all you need is to add the casOut parameter in the action. Here I'll save the results of the freq CAS action to a CAS table named warranty_freq in the Casuser caslib, and I will give the table a descriptive label.

proc cas;
   * Reference the CAS table *;
   casTbl = {name = "WARRANTY_CLAIMS", caslib = "casuser"};
 
   * Specify the columns to analyze *;
   colNames = {'Model_Year', 
               'Vehicle_Assembly_Plant', 
               {name = 'Claim_Repair_Start_Date', format = 'yyq.'}
   };
 
   * Analyze the CAS table and create a new CAS table *;
   simple.freq / 
	table= casTbl, 
	inputs = colNames,
	casOut = {
		name = 'warranty_freq',
		caslib = 'casuser',
		label = 'Frequency analysis by year, assembly plant and repair date by quarter'
	};
quit;

The results above show the freq action returned information about the newly created CAS table. Once you have a CAS table in the distributed CAS server you can continue working with it using CAS, or you can visualize the data like we did before using SGPLOT. The key concept here is the SGPLOT procedure does not visualize data on the CAS server. The SGPLOT procedure returns the entire CAS table back to SAS (compute server) as a SAS data set, then the visualization occurs on the client. This means if the CAS table is large, an error or slow processing might occur. However, in our scenario we created a smaller summarized CAS table, so sending 22 rows back to the client (compute server) isn't going to be an issue.

* Make a library reference to a Caslib *;
libname casuser cas caslib='casuser';
 
 
* Plot the SAS data set *;
title justify=left height=16pt "Total Warranty Claims by Year";
proc sgplot data=casuser.warranty_freq noborder;
	where _Column_ = 'Model_Year';
	vbar _Charvar_ / 
		response = _Frequency_
		nooutline;
	xaxis display=(nolabel);
	label _Frequency_ = 'Total Claims';
	format _Frequency_ comma16.;
quit;

Summary

Using the freq CAS action enables you to generate a frequency distribution for one or more columns and enables you to save the results as a SAS data set or a CAS table. They keys to this process are:

  • CAS actions execute on the distributed CAS server and return summarized results back to the client as a dictionary. You can store the dictionary using the result option.
  • Using dictionary manipulation techniques and the SAVERESULT statement you can save the summarized result table from the dictionary as a SAS data set. Once you have the SAS data set you can use all of your familiar SAS programming knowledge on the traditional compute server.
  • Using the casOut parameter in a CAS action enables you to save the summarized results in the distributed CAS server.
  • The SGPLOT procedure does not execute in CAS. If you specify a CAS table in the SGPLOT procedure, the entire CAS table will be sent back to SAS compute server for processing. This can cause an error or slow processing on large tables.
  • Best practice is to summarize large data in the CAS server, and then work with the summarized results on the compute server.

Additional resources

freq action
DESCRIBE statement
SAVERESULT statement
Plotting a Cloud Analytic Services (CAS) In-Memory Table
SAS® Cloud Analytic Services: CASL Programmer’s Guide 
SAS® Cloud Analytic Services: Fundamentals
CAS Action! - a series on fundamentals
Getting Started with Python Integration to SAS® Viya® - Index

 

Share

About Author

Peter Styliadis

Technical Training Consultant

Peter Styliadis is a Technical Training Consultant at SAS on the Foundations team in the Education division. The team focuses on course development and customer training. Currently Peter spends his time working with SAS Programming, Structured Query Language (SQL), SAS Visual Analytics, Programming in Viya and Python. Peter is from the Rochester/Buffalo New York area and does not miss the cold weather up north. After work, Peter can be found spending time with his wife, child and three dogs, as well as hiking and spending time at home learning something new. Prior to joining SAS, Peter worked at his family's restaurant (think My Big fat Greek Wedding), worked in customer service, then became a High School Math and Business teacher. To connect with Peter, feel free to connect on LinkedIn.

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top