This is a continuation of my previous blog post on SAS Data Studio and the Code transform. In this post, I will review some additional examples of using the Code transform in a SAS Data Studio data plan to help you prepare your data for analytic reports and/or models.
Create a Unique Identifier Example
The DATA step code below combines the _THREADID_ and the _N_ variables to create a UniqueID for each record.
The variable _THREADID_ returns the number that is associated with the thread that the DATA step is running in a server session. The variable _N_ is an internal system variable that counts the iterations of the DATA step as it automatically loops through the rows of an input data set. The _N_ variable is initially set to 1 and increases by 1 each time the DATA step loops past the DATA statement. The DATA step loops past the DATA statement for every row that it encounters in the input data. Because the DATA step is a built-in loop that iterates through each row in a table, the _N_ variable can be used as a counter variable in this case.
_THREADID_ and _N_ are variables that are created automatically by the SAS DATA step and saved in memory. For more information on automatic DATA step variables refer to its help topic.
Below are the results of the code above which creates the UniqueID column.
Cluster Records Example
The DATA step code below combines the _THREADID_ and the counter variables to create a unique ClusterNum for each BY group.
This code uses the concept of FIRST.variable to increase the counter if it is the beginning of a new grouping. FIRST.variable and LAST.variable are variables that CAS creates for each BY variable. CAS sets FIRST.variable when it is processing the first observation in a BY group, and sets LAST.variable when it is processing the last observation in a BY group. These assignments enable you to take different actions, based on whether processing is starting for a new BY group or ending for a BY group. For more information, refer to the topic Identifying the First and Last Rows in BY-Groups.
Since groupings can be on different threads, the variable _THREADID_ is used to identify the thread. _THREADID_ returns the number associated with the thread that the DATA step is running in a server session. It is a variable that is created automatically by the SAS DATA step and saved in memory.
De-duplication Example
The DATA step code below outputs the last record of each BY group; therefore, de-duplicating the data set by writing out only one record per grouping.
Below are the de-duplication results on the data set used in the previous Cluster Records Example section.
For more information about DATA step, refer to the Dictionary of SAS DATA Step Statements help topic.
Write Results to a File Example
The CASL code below writes the current results at this point in the data plan flow to an Excel file. This is accomplished using the table.save CAS action.
Below is the resulting customers2.xlsx file in the Public CAS library.
For more information on the available action sets, refer to the SAS Viya 3.3: System Programming Guide. For more information about CASL, refer to the SASĀ® Cloud Analytic Services 3.3: CASL Reference guide.
For more information on SAS Data Studio and the Code transform, please refer to this documentation.