Getting Started with Python Integration to SAS® Viya® - Part 8 - Creating Calculated Columns

0

Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS serverhow to execute CAS actions, and how to filter CAS tables. Now it's time to focus on creating calculated columns on a CAS table.

Load and explore data

First, I'll load the cars.csv file into memory using my CAS connection object conn, followed by the upload_file method from the SWATpackage. The upload_file method loads a client-side file into memory on the CAS server and returns a reference to the CAS table. I'll confirm this by using the display function to display the value of the tbl object. Remember, the data in this example is small for training purposes. Processing data in the CAS server's massively parallel processing environment is typically reserved for larger data.

tbl = conn.upload_file('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv', 
                       casout={'name':'cars', 
                               'caslib':'casuser'})
display(type)
# and the command results:
NOTE: Cloud Analytic Services made the uploaded file available as table CARS in caslib CASUSER(Peter).
NOTE: The table CARS has been created in caslib CASUSER(Peter) from binary data uploaded to Cloud Analytic Services.
CASTable('CARS', caslib='CASUSER(Peter)')

The results show the cars.csv file was successfully loaded to the CAS server and tbl is a CASTable object referencing the CAS table CARS.

Next, let's preview the CARS CAS table using the SWAT package head method on the tbl object.

tbl.head()

The CAS server returns 5 rows from the CAS table to the Python client as a DataFrame, as expected.

Create Calculated Columns

Now that we have created the table in the caslib and are familiar with the data, let's learn how to create calculated columns in CAS tables!

Python Technique

First, I'll begin with the Pandas bracket notation to create new columns. Since the SWAT package tries to blend the world of Pandas and SAS, this is a good starting point. Here, I'll create two columns. The first statement creates a column with upper case values from the Origin column, the second calculated column will find the percent increase of the Invoice price to the MSRP price.

tbl['NewCol1'] = tbl.Origin.str.upper()
tbl['NewCol2'] = ((tbl.MSRP - tbl.Invoice) / tbl.Invoice).mul(100).round()
tbl.head()

The results show the creation of two new columns, NewCol1 and NewCol2. Be aware, behind the scenes the SWAT package adds a variety of parameters to the CASTable object to create the columns. You can view the parameters by displaying the tbl object. These parameters are stored on the CASTable object and are applied to the table when an action or method is executed on the object.

display(tbl)
# and the command results:
{'name': 'CARS',
 'caslib': 'CASUSER(Peter)',
 'computedvars': ['NewCol1', 'NewCol2'],
 'computedvarsprogram': 'length _upper_0_ varchar(*); _upper_0_ = upcase(Origin); NewCol1 = ....

eval Method

You can also use the eval method from the SWAT package to create calculated columns. This works similarly to the Pandas eval method.

Below I'll create the same two columns, but before I do, I'll reset the CASTable object to reference the CARS CAS table, removing the parameters that were stored on the object from above. Then I'll use the eval method to create the columns. The eval method requires SAS assignment statements and functions.

tbl = conn.CASTable('cars', caslib='casuser')
 
tbl.eval('NewCol1 = upcase(Origin)')
tbl.eval('NewCol2 = round(((MSRP - Invoice) / Invoice) * 100)')
tbl.head()


The results show the two new columns NewCol1 and NewCol2 were created successfully.

CAS Table COMPUTEDVARSPROGRAM Parameter

Lastly, I'll use the computedVarsProgram parameter to create calculated columns. This method is specific to SAS Viya.

I'll begin by storing the SAS expressions as string variables. Notice the required semicolon at the end of each expression. Then, I'll add the computedVarsProgram parameter to the CASTable object tbl, and set the value equal to the two string expressions.

tbl = conn.CASTable('cars', caslib='casuser')
createNewCol1 = 'NewCol1 = upcase(Origin);'
createNewCol2 = 'NewCol2 = round(((MSRP - Invoice) / Invoice) * 100);'
tbl.computedVarsProgram = createNewCol1 + createNewCol2
tbl.head()


The results show the two new columns NewCol1 and NewCol2 were created successfully.

Summary

SAS Viya offers various ways to create new columns in CAS tables. The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas API techniques. You also have specific CAS techniques handy when working with the CASTable object directly. Choosing the correct method may depend on your greater code structure, but you now have the right tools to make that decision.

Additional and related resources

Share

About Author

Peter Styliadis

Technical Training Consultant

Peter Styliadis is a Technical Training Consultant at SAS on the Foundations team in the Education division. The team focuses on course development and customer training. Currently Peter spends his time working with SAS Programming, Structured Query Language (SQL), SAS Visual Analytics, Programming in Viya and Python. Peter is from the Rochester/Buffalo New York area and does not miss the cold weather up north. After work, Peter can be found spending time with his wife, child and three dogs, as well as hiking and spending time at home learning something new. Prior to joining SAS, Peter worked at his family's restaurant (think My Big fat Greek Wedding), worked in customer service, then became a High School Math and Business teacher. To connect with Peter, feel free to connect on LinkedIn.

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top