Create User Defined Functions (UDFs) for the CAS Server on SAS Viya

4

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. In this post, I'll show how to create user defined functions (UDFs) for the distributed CAS server using SAS and CASL code. Once the UDF is created, you can use it on the CAS server with programming languages like SAS, Python, R and more! Creating UDFs for CAS enables massively parallel processing (MPP) of your big data. If you want to create a UDF using Python and the SWAT package, check out my post Python Integration to SAS® Viya® - Part 22 - Create User Defined Functions (UDFs).

Load and prepare data in the CAS server

First, I'll create some fake data to use. This script will create a simple CAS table containing high and low temperatures and then previews the data.  The data in this example is small for training purposes. Processing data in the CAS server's massively parallel processing environment is typically reserved for larger data. The script also creates a CAS session if you do not have one.

/*********************************/
/* Confirm CAS session is active */
/* If not, create CAS sesssion   */
/*********************************/
%let cas_session_exists= %sysfunc(sessfound(&_CASNAME_));
%if &cas_session_exists=0 %then %do; 
	cas conn;
%end;
%else %do;
	%put NOTE:CAS session &_CASNAME_ already active;
%end;
 
/*****************************************/
/* Create a libref to the Casuser caslib */
/*****************************************/
libname casuser cas caslib='casuser';
 
 
/*****************************/
/* Create the test CAS table */
/*****************************/
data casuser.tempdata;
	do Temp = 'HighTemp = 83; LowTemp = 55;',
	      'HighTemp = 86; LowTemp = 59;',
              'HighTemp = 92; LowTemp = 63;',
              'HighTemp = 91; LowTemp = 65;',
              'HighTemp = 80; LowTemp = 51;';
           output;
	end;
run; 
 
/* Preview the CAS table */
proc print data=casuser.tempdata(obs=5);
run;

The results show the CAS table TEMPDATA was created in the Casuser caslib and contains a single character column.

Objective

My objective is to obtain the numeric Fahrenheit values from the string and place them in their own columns. Then, use the numeric Fahrenheit values to create two new columns with the Celsius values. My final data should look something like this:

There are a variety of ways to solve this problem. However, in my solution I want to create two UDFs to accomplish the task. Then, the UDFs can be used in my code or by other users.

Create the UDFs Using a CAS Action

In this example, I'll focus on creating the UDFs for the CAS server using the CAS language (CASL). If you don't need the power of the massively parallel processing CAS server, you can create these functions on the familiar SAS Compute server on the SAS Viya platform using the traditional FCMP procedure. In this example, my objective is to create these functions so they will run on the distributed CAS server. Once they are created for CAS, they can be also be used with languages like SAS, Python, and R.

To create UDFs for the CAS server you need to use the fcmpact action set. The fcmpact action set contains a variety of CAS actions. We will need to use the addroutines action. This action works similarly to the FCMP procedure. Within the action we will need to use a few parameters. The main parameter is the routineCode parameter. This parameter specifies the FCMP routine's code (or function) as a string.  If you have used the FCMP procedure before, this will be similar. The main difference is you will use the CAS procedure to execute the action. The CAS procedure uses CASL, so it's a bit different than using traditional SAS code. Below is the code to create the two UDFs.

/***********************************/
/* Create a UDF for the CAS server */
/***********************************/
/* FCMP Action Set: https://go.documentation.sas.com/doc/en/pgmsascdc/default/caspg/cas-fcmpact-TblOfActions.htm */
proc cas;
 
	/* The subroutine code to create the get_temp_value UDF */
	source get_temp_value_func;
	    function get_temp_value(colname $, position);
 
	        /* Get the statement by position */
	        get_statement_from_position = scan(colname, position,';');
 
	        /* Get the number from the string */
	        get_number_as_string = scan(get_statement_from_position, -1, ' ');
 
	        /* Get the number from the statement and convert to a numeric column */
	        convert_string_to_numeric = input(get_number_as_string, 8.);
 
	        /* Return numeric value */
	        return(convert_string_to_numeric);
 
	    endsub;
	endsource;
 
	/* The subroutine code to create the f_to_C UDF */
	source f_to_c_func;
	    function f_to_c(f_temp);
 
	        /* Convert the Fahrenheit temp to Celsius */
	        c_temp = round((f_temp - 32) * (5/9));
 
	        /* Return celsius value */
	        return(c_temp);
 
	    endsub;
	endsource;
 
	/* Create the UDFs */
	fcmpact.addRoutines /
	      routineCode = cats(get_temp_value_func,f_to_c_func),  /* Concat the two string variables with the function code */
              saveTable = True,                                     /* Save the table as a source file */
              funcTable = {name = "my_udfs_sas",                    /* Create the CAS table */
			   caslib = 'casuser',
			   replace = TRUE},
        appendTable = True;                                   /* Append the functions to the table if it already exists */
quit;

The code does the following:

Creating the routine code:

  • The SOURCE statements create a two string variables that hold the routine code. I'll create two separate variables to separate the functions. You can do it within one string variable if you choose. I prefer to separate each function in its own string.
  • Within the routine code you start a function with the the FUNCTION statement and end it with the ENDSUB statement. After the FUNCTION statement you specify the function name and arguments. If the argument is character it requires a $ sign after the argument.
  • In the get_temp_value function I use the SAS SCAN function to select part of a string. Then I use the SAS INPUT function to convert the character value to numeric.
  • In the f_to_c function I use the formula to convert Fahrenheit to Celsius. I use the SAS ROUND function to round the value to a whole number.

Creating the UDFs with an action:

After the routine code is created, use the addroutines CAS action to create the functions in CAS. The addroutines action creates an in-memory CAS table with the function definitions. In the action, I'll use the following parameters:

  • The routineCode parameter specifies the FCMP routine's code that is saved to the table. In this parameter I'll concatenate the two strings with the routine code from the SOURCE statements.
  • The appendTable parameter specifies to append the new functions to an existing CAS table. In this example it's not required since I'm not appending the functions to an existing table. If you need to add new functions to an existing CAS table, this is required.
  • The funcTable parameter specifies the CAS table name and location where the FCMP function is written. Since UDFs are stored in CAS tables, make sure you specify the CAS table name and caslib. If you want to share the functions with other users, use a caslib to which others have access. Here a CAS table named MY_UDFS_SAS is created in the Casuser caslib. Remember, if you want to share the UDFs you must save the CAS table to a caslib that is accessible by others.
  • The saveTable parameter specifies if the FCMP table should be saved to disk. If the value is set to True, the action saves the file to disk in the specified caslib using the name of the CAS table from above (in all uppercase) with the sashdat extension. Once the CAS table is saved to disk, you (or others) can load this file into memory anytime you need the functions.

I'll run the code and view the log.

NOTE: Active Session now CASAUTO.
NOTE: Added action set 'fcmpact'.
NOTE: Cloud Analytic Services saved the file MY_UDFS_SAS.sashdat in caslib CASUSER(Peter).
NOTE: PROCEDURE CAS used (Total process time):
      real time           0.14 seconds
      cpu time            0.02 seconds

The results show the MY_UDFS_SAS.sashdat file was created on disk in the Casuser caslib as expected.

By default, the addroutines action also creates the table in-memory. I can confirm the table and source file were created using the CASUTIL procedure.

/* View the CAS table and source file */
proc casutil incaslib = 'casuser';
     list tables;
     list files;
quit;

CAS Table:

Source File:

The results show that the addroutines action automatically creates the MY_UDFS_SAS CAS table. That table contains the function definitions. If the functions were already created and the table was not loaded into memory, you can use the loadfcmptable action to load the sashdat file you created above.

proc cas;
     fcmpact.loadFcmpTable /
          table='MY_UDFS_SAS.sashdat', 
          caslib = 'casuser';
quit;
 
/* and the results */
NOTE: Active Session now CASAUTO.
ERROR: The table MY_UDFS_SAS already exists in caslib CASUSER(Peter).
ERROR: The action stopped due to errors.
NOTE: PROCEDURE CAS used (Total process time):
      real time           0.05 seconds
      cpu time            0.00 seconds

The results here return an error since the CAS table is already in-memory. If the CAS table wasn't loaded into memory, you would need to load the sashdat file first to use the UDFs.

There is one more step before we can use the functions in a DATA step through the CAS engine. Since the DATA is used through the CAS engine, the cmplib options for CAS server and the Compute server need to be modified. The cmplib option specifies one or more CAS tables that contain the functions. For the CAS server you use the setSessOpt option and for Compute server you just specify the cmplib option. By default, SAS won't know that these functions exist and you will receive an error. If you use the runCode action to run DATA step in CAS directly, you will only need to modify the CAS cmplib option. I prefer using the DATA step through the CAS engine.

/* Modify the cmplib option for the CAS and Compute servers to use the function CAS table */
options sessopts=(cmplib='casuser.my_udfs_sas')  cmplib=(casuser.my_udfs_sas);

 

Once the functions are created and the cmplib system options are set, you can use the functions in the SAS DATA step (or other languages if you want). I'll start by previewing the original data again.

/* Preview the original CAS table */
proc print data=casuser.tempdata(obs=5);
run;

Next, use the new functions. I'll use the DATA step through the CAS engine to create new columns with the UDFs. I'll add the PRINT procedure at the end to preview the new clean CAS table.

/* Use the function on the CAS table to run in the distributed CAS server */
data casuser.final_sas;
    set casuser.tempdata;
    HighTempF = get_temp_value(Temp,1);
    LowTempF = get_temp_value(Temp,2);
    HighTempCelsius = f_to_c(HighTempF);
    LowTempCelsius = f_to_c(LowTempF);
run;
 
/* Preview the clean data */
proc print data=casuser.final_sas(obs=5);
run;

The results show that the functions worked! I can confirm they ran in the CAS server by viewing the log.

NOTE: Running DATA step in Cloud Analytic Services.
NOTE: The DATA step will run in multiple threads.
NOTE: There were 5 observations read from the table TEMPDATA in caslib CASUSER(Peter).
NOTE: The table final_sas in caslib CASUSER(Peter) has 5 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.52 seconds
      cpu time            0.06 seconds

 

The log shows that the DATA step ran in Cloud Analytic Services (CAS) as expected and the CAS table FINAL_SAS was created successfully.

Using the UDFs in Other Languages

So maybe you are a SAS coder and you have coworkers that use Python or R. Your coworkers want to use the new functions you created in CAS. How can they go about using them? Well, it's pretty easy! Since the function definitions were saved in the MY_UDFS_SAS.sashdat file, they simply need to:

  1. load the fcmpact action set using the builtins.loadActionSet set action
  2. load the MY_UDFS_SAS.sashdat file into memory on the CAS server using the fcmpact.loadFcmpTable action
  3. lastly, use the the sessionProp.setSessOpt action to set the cmplib parameter to point to the CAS table that contains the function definitions

The one thing to be aware of in this example is the function file was stored in the Casuser caslib which is a personal caslib. To share the functions with others users you must store it in a shared caslib.

Summary

The SAS Viya platform provides two analytic engines to process your data. Both the traditional SAS Compute server and the CAS server. The Compute server allows you to execute traditional SAS code as you always have. If you want to create a UDF for your traditional SAS processing in SAS Viya, continue to use the FCMP procedure. If you want to use the CAS server's massively parallel processing capabilities, you will need to use the addRoutines action to create your UDFs. CAS also enables you to create a UDF and then use other languages Python and  R to process your distributed data. Remember, if you are a SAS programmer and need to create UDFs for CAS, you will need to use the CAS action instead of the FCMP procedure, which can be a bit different at first. 

Additional and related resources

Share

About Author

Peter Styliadis

Technical Training Consultant

Peter Styliadis is a Technical Training Consultant at SAS on the Foundations team in the Education division. The team focuses on course development and customer training. Currently Peter spends his time working with SAS Programming, Structured Query Language (SQL), SAS Visual Analytics, Programming in Viya and Python. Peter is from the Rochester/Buffalo New York area and does not miss the cold weather up north. After work, Peter can be found spending time with his wife, child and three dogs, as well as hiking and spending time at home learning something new. Prior to joining SAS, Peter worked at his family's restaurant (think My Big fat Greek Wedding), worked in customer service, then became a High School Math and Business teacher. To connect with Peter, feel free to connect on LinkedIn.

4 Comments

  1. Peter Styliadis
    Peter Styliadis on

    Hi Bart, this looks like a version issue. I wrote the blog using SAS Viya release Stable 2023.10. SAS Viya for Learners is a older.

    I tested the code and got the same error and then found another issue.

    1. First issue is creating the UDF. I played around with the action and realized that when you have the appendTable option it won't create the function table in Viya 3.5. Remove the appendTable option and it worked. Not sure why that happens.

    2. Once you create the UDF there is another issue. When you go to apply the UDF in the DATA step I couldn't get it to work. It seems like you can't use the DATA step to apply a UDF in Viya 3.5 (i'd have to test this more for more information). Instead, it seems like you have to use the runProgram action (https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/caspg/cas-fcmpact-runprogram.htm) instead. It works like the DATA step.

    Here is the solution code for you.

    libname casuser cas caslib='casuser';

    /* Create fake data */
    data casuser.test_table;
    do value= 1,2,3,4,5;
    output;
    end;
    run;
    proc print data=casuser.test_table;
    run;

    options msglevel=I;
    /* I removed the appendTable option */
    proc cas;
    fcmpact.addRoutines /
    routineCode = '
    function fcmpFunction (x);
    return ( x * 10 );
    endfunc;
    ',
    saveTable = True,
    funcTable = {
    name = "my_udfs_sas",
    caslib = 'casuser',
    replace = TRUE
    };
    quit;

    /* The table my_udfs_sas and my_udfs_sas.sashdat file is now created. */
    proc casutil incaslib = 'casuser';
    list tables;
    list files;
    quit;

    /* Set options to use the UDF */
    options sessopts=(cmplib='casuser.my_udfs_sas') cmplib=(casuser.my_udfs_sas);

    /* Check cmplib option in CAS and confirm it is set to casuser.my_udfs_sas*/
    proc cas;
    listsessopts;
    quit;

    /* The DATA step with the UDF errors out in Viya 3.5 */
    data casuser.new_table;
    set casuser.test_table;
    newvalue = fcmpFunction(Value);
    run;

    /* Looks like you have to use this method instead of the familiar DATA step in Viya 3.5 */
    proc cas;
    fcmpact.runProgram /
    /* Add data step statements here */
    routineCode = "
    newValue = fcmpFunction(Value);
    other = 'Hi!';
    if newValue > 40 then value_group = 'High';
    else value_group = 'low';
    ",
    inputData={name="test_table", caslib = 'casuser'},
    outputData={name="test_table_new", caslib = 'casuser', replace=true};
    quit;

    proc print data=casuser.test_table_new;
    run;

  2. Hi Peter,

    I was trying your example on the Viya for Lerners 3.5 server and hit the wall. I executed the following in a brand new fresh session:

    cas;
    libname casuser cas caslib='casuser';

    options msglevel=I;
    proc cas;
    fcmpact.addRoutines /
    routineCode = {'function fcmpFunction (x); return ( x * 10 ); endfunc;'}
    ,saveTable = True
    ,funcTable =
    {name = "my_udfs_sas"
    ,caslib = 'casuser'
    ,replace = TRUE
    }
    ,appendTable = True
    ;
    quit;

    and no `my_udfs_sas` dataset was generated in `casuser` library, the log was only:

    92 ;
    93 quit;
    NOTE: Active Session now CASAUTO.
    NOTE: PROCEDURE CAS used (Total process time):
    real time 0.00 seconds
    cpu time 0.01 seconds

    Is there any additional option I need to run to enable datasets creating? Or is it just Viya for Lerners 3.5 setup?

    All the best
    Bart

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top