How to Emulate DATA Step DESCENDING BY Statements in SAS® Cloud Analytic Services (CAS)

0

DATA Step BY Statements

DATA Step is a very powerful language that SAS and Open Source programmers leverage to stage data for the analytical life cycle. A popular technique is to use the DESCENDING option on BY variables to identify the largest value. Let’s review the example in Figure 1:

  • On line 74 we are using the descending option on the BY statement for the numeric variable MSRP. The reason we are doing this is so we can identify the most expensive car for each make of car in our data set.
  • On line 79 we group our data by MAKE of car.
  • On line 80 we leverage the FIRST. statement on the subsetting IF statement to output the first record for each MAKE. In Figure 2 we can review the results.


Figure 1. Descending BY Statement


Figure 2. Listing of Most Expensive Cars by MAKE

What is CAS?

CAS is SAS Viya’s in-memory engine that processes data and logic in a distributed computing paradigm. When working with CAS tables we can emulate the DESCENDING BY statement by creating a CAS View which will then become the source table to our DATA Step. Let’s review Figure 3:

  • On line 79 we will leverage the CASL (SAS® Cloud Analytic Services Language) action set TABLE with the action VIEW to create the CAS View that will be used as our source table in the DATA Step.
  • On lines 80 and 81 we will store our CAS View in the CASUSER CASLIB with the name of DESCENDING.
  • On line 82 and 83 we use the TABLES statement to specify the input CAS table to our CAS View.
  • On line 84 we use the VARLIST statement to identify the columns from the input table we want in our CAS View.
  • On lines 85 we create a new variable for our CAS View using the computedVars statement,
  • On line 86 we provide the math for our new variable N_MSRP. N_MSRP is the negated value of the input CAS table variable MSRP. Note: This emulation only works for numeric variables. For character data I suggest using LAST. processing which you can review in this blog post.


Figure 3. Emulating DESCENDING BY Statement for Numeric Variables

Now that we have our CAS View with its new variable N_MSRP, we can move on to the DATA Step code in Figure 3.

  • On line 92 the SET statement specifies the source to our DATA Step; CAS View CASUSER.DESCENDING
  • On line 83 we leverage the BY Statement to group our data in ascending order for the CAS View variables MAKE and N_MSRP. Because N_MSRP is in ascending order our original variable MSRP is in DESCENDING order.
  • On line 94 we use a subsetting IF statement to output the first occurrence of each MAKE.

Figure 4 is a listing of our new CAS table CASUSER.DESCENDING2 and displays the most expensive car for each make of car.


Figure 4. Listing of Most Expensive Cars by MAKE

Template for Creating a CAS View

/* Create a CAS view */
/* For each DESCENDING numeric create a new variable(s) */
/* The value of the new variable(s) is the negated value */
/* of the original DESCENDING BY numeric variable(s) */
proc cas;
   table.view / replace = true
   caslib='casuser'
   name='descending'
   tables={{
      name='cars'
      varlist={'msrp' 'make'},
      computedVars={{name='n_msrp'}},
      computedVarsProgram='n_msrp = -(msrp)'
   }};
run;
quit;
 
data casuser.descending2;
   set casuser.descending;
   by make n_msrp ;
   if first.make ;
run;
 
proc print data=casuser.descending2;
title "Most Expensive Cars";
run;

Conclusion

It is a very common coding technique to process data with a DESCENDING BY statement using DATA Step. With Viya 3.5 the DESCENDING BY statements is supported, for numeric and character data in DATA Step, with this caveat: DESCENDING works on all but the first BY variable on the BY statement. For earlier versions of SAS Viya this emulation is the best practices for numeric data that you want in DESCENDING order.

Tags
Share

About Author

Steven Sober

Advisory Solutions Architect, Data Management

Steven is responsible for empowering SAS sales and system engineers in the positioning and integration of SAS® Viya, SAS® In-Database Code Accelerator for Hadoop, SAS® Scalable Performance Data Server, and SAS® Grid Manager. During his tenure at SAS, Steven has traveled globally working with customers to quickly integrate the SAS system to automate processes. Before joining SAS in 1989 as the first employee of SAS Switzerland, Steven worked for the U.S. Geological Survey, Water Resources Division, assisting hydrologists to leverage the SAS system to derive intelligence on ground and surface water in Southern Colorado. He holds a Bachelor of Science degree in Computer Science from the University of Southern Colorado in Pueblo, Colorado.

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top