Welcome back to my SAS Users blog series CAS Action! - a series on fundamentals. I've broken the series into logical, consumable parts. If you'd like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets - a brief intro. Or if you'd like to see other topics in the series, see the overview page. Otherwise, let's learn how to filter rows in CAS tables.
In this example, I will use the CAS procedure to filter rows in CAS tables for use in CAS actions. I could execute similar code in Python, R and more with some slight changes to the syntax for the specific language. The other languages also include specific methods to accomplish the same objective.
I'll start by executing the following code to create a CAS table named PRODUCTS in the Casuser caslib with 80 million rows. Then, I'll create a variable named productsTbl to reference the PRODUCTS CAS table, and use that variable as the value to the table parameter in the numRows and fetch CAS actions.
proc cas; productsTbl = {name = 'products', caslib = 'casuser'}; simple.numRows / table = productsTbl; table.fetch / table = productsTbl; quit; |
The numRows action shows the PRODUCTS table has 80 million rows, and the fetch action shows the first 20 rows of the table.
Create a Simple Filter
To filter a CAS table you add the where sub parameter to the castable reference with the where expression as a string. The string is similar to using the WHERE statement in SAS.
Here, I'll add the where key to the productsTbl dictionary that references the CAS table, and filter for all rows where the Product column equals Hats. The where expression is only applied when an action is executed on the productsTbl variable (the CAS table reference). The underlying CAS table is not modified.
proc cas; productsTbl = {name = 'products', caslib = 'casuser', where = 'Product = "Hats"'}; simple.numRows / table = productsTbl; table.fetch / table = productsTbl; quit; |
After executing the numRows and fetch actions, the results show that there are 8,001,336 rows that contain the product Hats.
Use a SAS Function to Filter a CAS Table
Within the where expression you can also use most SAS Functions. Here, I'll use the UPCASE function to upper case all values in the Product column to standardize the case, then filter for rows with the value HATS. The results will be the same as the previous section since there are no underlying case issues in the CAS table. Where you would see a difference would be if the table contained mixed case versions of Hats, such as hats, HAts, hATs, etc. Without the function, the query is case sensitive and would only pick up Hats. All iterations of Hats would be picked up with using the UPCASE function.
proc cas; productsTbl = {name = 'products', caslib = 'casuser', where = 'upcase(Product) = "HATS"'}; simple.numRows / table = productsTbl; table.fetch / table = productsTbl; quit; |
Multiple WHERE Expressions
Just like the WHERE statement in SAS, you can specify two or more where expressions with a logical operator. Here I'll filter the CAS table for all rows where the Product column contains the value HATS, and a StoreID less than 15. The entire where expression is stored as a string.
proc cas; productsTbl = {name = 'products', caslib = 'casuser', where = 'upcase(Product) = "HATS" and StoreID < 15'}; simple.numRows / table = productsTbl; table.fetch / table = productsTbl; quit; |
Notice that the results show that 1,789,106 rows have the product hats with a store ID less than 15.
Create a Calculated Column as a Filter
You can create a calculated expression to use as a filter. The calculated expression is not added as a column to the table output, instead it is used to filter the CAS table during the execution of the actions. Here I'll multiple the Price by Quantity, and filter for all rows greater than 40.
proc cas; productsTbl = {name = 'products', caslib = 'casuser', where = 'upcase(Product) = "HATS" and StoreID < 15 and Price * Quantity > 40'}; simple.numRows / table = productsTbl; table.fetch / table = productsTbl; quit; |
And the log and results:
Notice in the results the calculated column is not output, it was only used to filter the rows. With all three where expressions, we now have a total of 436,728 rows.
Storing the Where Expressions in a Variable
Lastly, I want to clean up my code. Instead of adding the where expressions as a single string, I can use a SOURCE block in the CAS language. The SOURCE block enables embedding text in the program and assigning it to a given variable. It avoids any possible quoting issues and helps compartmentalize your code. Here, I'll name my SOURCE block filter and add the where expressions. Then, I'll use the filter variable as the value to the where key. The results are the same as the previous code.
proc cas; source filter; upcase(Product) = "HATS" and StoreID < 15 and Price * Quantity > 40; endsource; productsTbl = {name = 'products', caslib = 'casuser', where = filter}; simple.numRows / table = productsTbl; table.fetch / table = productsTbl; quit; |
Summary
In summary, using the where sub-parameter allows you to easily filter your tables for use in CAS actions. A few key points:
- The where sub parameter takes a string as its value and is similar to using the WHERE expression in SAS.
- Using the where sub-parameter does not modify the original CAS table. It is simply applied when executing an action on the CAS table reference.
- Within the where expression you can use most SAS functions.
- If you create a dictionary variable with the CAS table reference, you can then use that variable in a variety of actions within the table parameter.
- Using a SOURCE block enables you to embed your where expressions in a variable without using a string.
Additional resources
fetch Action
SAS® Cloud Analytic Services: Fundamentals
Code
4 Comments
Hi Jason. To confirm, you want to store DATA step code in a SOURCE block, then replace a specific part of that DATA step in the source block with the value of i? If I am understanding correctly, this code below should help.
A SOURCE block simply stores text. See this code:
* Notice that the source block simply stores a string in a variable *;
proc cas;
source ds_code_execute;
data casuser.test;
x=:value:;
output;
run;
endsource; print "-----ds_code_execute value------";
print ds_code_execute;
describe ds_code_execute;
quit;
Since the SOURCE block stores a string, you can use the tranwrd function to replace a string in the SOURCE block. The key is to making the string you want to replace unique. Here I made the string I want to be replaced called :value:.
I also noticed you used a macro loop. There is no need to do that in CASL. You can use the CASL loop.
proc cas;
source ds_code_execute;
data casuser.test;
x=:value:;
output;
run;
endsource;* No need for a macro loop. CASL has a looping feature *;
do i=1 to 5; * make the i variable a string *;
make_i_a_string = putn(i,'1.');
print char_var; * create a new variable with the data step code and replace whatever you want based on the unique string you used in the data step code *;
new_ds_code=tranwrd(ds_code_execute,':value:',char_var);
print '------------------------';
print new_ds_code; end;
run;
Great to know a variable created by the source block can be used by WHERE option. I wish in the near future the source block can be stacked, like a do loop or something. Thanks Peter
Thanks Jason. Can you elaborate a bit on what you mean by stacked? Maybe a scenario? There might be a way to do what you are looking for.
I was running a parallel session a few days ago which has the layout like this:
proc cas;
sessions={};
source casl_code;
%do i=1 %to 5;
sessions[&i.] = create_parallel_session();
sessionProp.setSessOpt session=sessions[&i.] / caslib="&caslib";
datastep.runcode session=sessions[&i.] async="&file."/code=long_program /*<-- Here I prefer to use a source block to define this long_program inside of the same %do loop since it is contingent on each &i.*/
%end;
endsource;
sccasl.runCasl / code = casl_code;
Do you have a good solution for that?