CAS-Action! Filtering Rows in CAS Tables

4

Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page. Otherwise, let's learn how to filter rows in CAS tables.

In this example, I will use the CAS procedure to filter rows in CAS tables for use in CAS actions. I could execute similar code in Python, R and more with some slight changes to the syntax for the specific language. The other languages also include specific methods to accomplish the same objective.

I'll start by executing the following code to create a CAS table named PRODUCTS in the Casuser caslib with 80 million rows. Then, I'll create a variable named productsTbl to reference the PRODUCTS CAS table, and use that variable as the value to the table parameter in the numRows and fetch CAS actions.

proc cas;
    productsTbl = {name = 'products', caslib = 'casuser'};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:

The numRows action shows the PRODUCTS table has 80 million rows, and the fetch action shows the first 20 rows of the table.

Create a Simple Filter

To filter a CAS table you add the where sub parameter to the castable reference with the where expression as a string. The string is similar to using the WHERE statement in SAS.

Here, I'll add the where key to the productsTbl dictionary that references the CAS table, and filter for all rows where the Product column equals Hats. The where expression is only applied when an action is executed on the productsTbl variable (the CAS table reference). The underlying CAS table is not modified.

proc cas;
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   where = 'Product = "Hats"'};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:
 

After executing the numRows and fetch actions, the results show that there are 8,001,336 rows that contain the product Hats.

Use a SAS Function to Filter a CAS Table

Within the where expression you can also use most SAS Functions. Here, I'll use the UPCASE function to upper case all values in the Product column to standardize the case, then filter for rows with the value HATS. The results will be the same as the previous section since there are no underlying case issues in the CAS table. Where you would see a difference would be if the table contained mixed case versions of Hats, such as hats, HAts, hATs, etc. Without the function, the query is case sensitive and would only pick up Hats. All iterations of Hats would be picked up with using the UPCASE function.

proc cas;
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   where = 'upcase(Product) = "HATS"'};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:
 

Multiple WHERE Expressions

Just like the WHERE statement in SAS, you can specify two or more where expressions with a logical operator. Here I'll filter the CAS table for all rows where the Product column contains the value HATS, and a StoreID less than 15. The entire where expression is stored as a string.

proc cas;
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   where = 'upcase(Product) = "HATS" and StoreID < 15'};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:

Notice that the results show that 1,789,106 rows have the product hats with a store ID less than 15.

Create a Calculated Column as a Filter

You can create a calculated expression to use as a filter. The calculated expression is not added as a column to the table output, instead it is used to filter the CAS table during the execution of the actions. Here I'll multiple the Price by Quantity, and filter for all rows greater than 40.

proc cas;
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   where = 'upcase(Product) = "HATS" and 
                                    StoreID < 15 and 
                            Price * Quantity > 40'};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:

Notice in the results the calculated column is not output, it was only used to filter the rows. With all three where expressions, we now have a total of 436,728 rows.

Storing the Where Expressions in a Variable

Lastly, I want to clean up my code. Instead of adding the where expressions as a single string, I can use a SOURCE block in the CAS language. The SOURCE block enables embedding text in the program and assigning it to a given variable. It avoids any possible quoting issues and helps compartmentalize your code. Here, I'll name my SOURCE block filter and add the where expressions. Then, I'll use the filter variable as the value to the where key. The results are the same as the previous code.

proc cas;
    source filter;
        upcase(Product) = "HATS" and 
        StoreID < 15 and 
        Price * Quantity > 40;
    endsource;
 
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   where = filter};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:
 

Summary

In summary, using the where sub-parameter allows you to easily filter your tables for use in CAS actions. A few key points:

  • The where sub parameter takes a string as its value and is similar to using the WHERE expression in SAS.
  • Using the where sub-parameter does not modify the original CAS table. It is simply applied when executing an action on the CAS table reference.
  • Within the where expression you can use most SAS functions.
  • If you create a dictionary variable with the CAS table reference, you can then use that variable in a variety of actions within the table parameter.
  • Using a SOURCE block enables you to embed your where expressions in a variable without using a string.

Additional resources

fetch Action
SAS® Cloud Analytic Services: Fundamentals
Code

Share

About Author

Peter Styliadis

Technical Training Consultant

Peter Styliadis is a Technical Training Consultant at SAS on the Foundations team in the Education division. The team focuses on course development and customer training. Currently Peter spends his time working with SAS Programming, Structured Query Language (SQL), SAS Visual Analytics, Programming in Viya and Python. Peter is from the Rochester/Buffalo New York area and does not miss the cold weather up north. After work, Peter can be found spending time with his wife, child and three dogs, as well as hiking and spending time at home learning something new. Prior to joining SAS, Peter worked at his family's restaurant (think My Big fat Greek Wedding), worked in customer service, then became a High School Math and Business teacher. To connect with Peter, feel free to connect on LinkedIn.

4 Comments

  1. Peter Styliadis
    Peter Styliadis on

    Hi Jason. To confirm, you want to store DATA step code in a SOURCE block, then replace a specific part of that DATA step in the source block with the value of i? If I am understanding correctly, this code below should help.

    A SOURCE block simply stores text. See this code:

    * Notice that the source block simply stores a string in a variable *;
    proc cas;
    source ds_code_execute;
    data casuser.test;
    x=:value:;
    output;
    run;
    endsource; print "-----ds_code_execute value------";
    print ds_code_execute;
    describe ds_code_execute;
    quit;

    Since the SOURCE block stores a string, you can use the tranwrd function to replace a string in the SOURCE block. The key is to making the string you want to replace unique. Here I made the string I want to be replaced called :value:.

    I also noticed you used a macro loop. There is no need to do that in CASL. You can use the CASL loop.

    proc cas;
    source ds_code_execute;
    data casuser.test;
    x=:value:;
    output;
    run;
    endsource;* No need for a macro loop. CASL has a looping feature *;

    do i=1 to 5; * make the i variable a string *;
    make_i_a_string = putn(i,'1.');
    print char_var; * create a new variable with the data step code and replace whatever you want based on the unique string you used in the data step code *;
    new_ds_code=tranwrd(ds_code_execute,':value:',char_var);
    print '------------------------';
    print new_ds_code; end;
    run;

  2. Great to know a variable created by the source block can be used by WHERE option. I wish in the near future the source block can be stacked, like a do loop or something. Thanks Peter

    • Peter Styliadis
      Peter Styliadis on

      Thanks Jason. Can you elaborate a bit on what you mean by stacked? Maybe a scenario? There might be a way to do what you are looking for.

      • I was running a parallel session a few days ago which has the layout like this:
        proc cas;
        sessions={};
        source casl_code;
        %do i=1 %to 5;
        sessions[&i.] = create_parallel_session();
        sessionProp.setSessOpt session=sessions[&i.] / caslib="&caslib";
        datastep.runcode session=sessions[&i.] async="&file."/code=long_program /*<-- Here I prefer to use a source block to define this long_program inside of the same %do loop since it is contingent on each &i.*/
        %end;
        endsource;
        sccasl.runCasl / code = casl_code;

        Do you have a good solution for that?

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top