Getting Started with Python Integration to SAS® Viya® - Part 16 - Execute SQL

0

Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In this post I'll discuss how to execute SQL with the Python SWAT package in the distributed CAS server.

Prepare and load data to the CAS server

I created a Python function named createDemoData to prepare and load data to the CAS server. The function takes your CAS connection object as the parameter. The function will :

  • load the RAND_RETAILDEMO.sashdat file from the Samples caslib into memory in the Casuser caslib. This is the main table.
  • create a DataFrame on the Python client and load it to the CAS server as the table LOYALTY_LOOKUP_TBL in the Casuser caslib. This is the lookup table.
  • run the tableInfo CAS action to confirm the CAS tables RAND_RETAILDEMO  and LOYALTY_LOOKUP_TBL  are in-memory in the Casuser caslib.
## Import packages
import swat
import os
import pandas as pd
import numpy as np
 
conn = ## Specify your CAS connection information
 
 
def createDemoData(connectionObject):
    '''Specify your CAS connectin object as the parameter'''
 
    connectionObject.loadTable(path = 'RAND_RETAILDEMO.sashdat', caslib = 'samples',
                               casout = {
                                   'caslib':'casuser',
                                   'replace':True
                               })
 
    myData = {
        'loyalty_id':[0,1],
        'loyalty_value':['No','Yes'],
    }
 
    df = pd.DataFrame(myData)
    connectionObject.upload_frame(df, 
                                  casout = {
                                        'name':'loyalty_lookup_tbl',
                                        'caslib':'casuser',
                                        'replace':True
                                   })
 
    tableInfo = connectionObject.tableInfo(caslib = 'casuser')
    return tableInfo
 
 
## Execute function using CAS connection object
createDemoData(conn) 
 
 
# and the results
NOTE: Cloud Analytic Services made the file RAND_RETAILDEMO.sashdat available as table RAND_RETAILDEMO in caslib CASUSER(Peter).
NOTE: Cloud Analytic Services made the uploaded file available as table LOYALTY_LOOKUP_TBL in caslib CASUSER(Peter).
NOTE: The table LOYALTY_LOOKUP_TBL has been created in caslib CASUSER(Peter) from binary data uploaded to Cloud Analytic Services.

Load the FedSQL CAS action set

To use SQL with the Python SWAT package in the CAS server you will need to load the FedSQL CAS action set.

conn.loadActionSet('fedSQL')

To view more information about the FedSQL action set you can run the help method.

conn.help(actionset='fedSQL')

The FedSQL action set contains a single action named execDirect. We can use this action to submit SQL queries to the distributed CAS server.

Perform a simple query

Let's preview our CAS tables using a simple query.  First I'll view 5 rows from the RAND_RETAILDEMO CAS table from the Casuser caslib. I'll store my query as a string in the variable previewRetail. Within the string I'll add the SELECT, FROM and LIMIT clauses. In the FROM clause I'll use a two-level naming convention to specify the table. This specifies the caslib and CAS table name. Lastly, execute the execDirect action and add the query parameter with the value previewRetail (the query as a string).

previewRetail = '''
    select *
        from casuser.RAND_RETAILDEMO
        limit 5;
'''
conn.execDirect(query = previewRetail)

The query returns five rows from the RAND_RETAILDEMO CAS table from the Casuser caslib. Notice the loyalty_card column. It contains a 0 and 1. We will use that column as our join criteria to obtain the loyalty values later in the post.

I'll do the same thing to preview the LOYALTY_LOOKUP_TBL CAS table from the Casuser caslib.

previewLoyalty = '''
    select *
        from casuser.LOYALTY_LOOKUP_TBL
        limit 5;
'''
conn.execDirect(query = previewLoyalty)

This query returns two rows, one for each loyalty_id. We will use this column to join with the loyalty_card column from the previous table to obtain the loyalty_value.

Perform a join

Let's perform a inner join using the following:

  • the RAND_RETAILDEMO CAS table from the Casuser caslib using the loyalty_card column
  • the LOYALTY_LOOKUP_TBL CAS table from the Casuser caslib using the loyalty_id column
  • keep the CustID, age, Department, loyalty_card and loyalty_value columns
  • limit the join to five rows to confirm it works as expected
joinData = '''
    select retail.CustID,
           retail.age,
           retail.Department,
           retail.loyalty_card,
           loyalty.loyalty_value
        from casuser.RAND_RETAILDEMO as retail inner join
             casuser.LOYALTY_LOOKUP_TBL as loyalty
        on retail.loyalty_card = loyalty.loyalty_id
        limit 5;
'''
conn.execDirect(query = joinData)

The results show that the join was successful and returns a CASResults object to the Python client.

Create a new CAS table using the join

Now that we know the join works, let's join the full table and create a new CAS table with the results. I'll create a new CAS table named LOYALTY_FINAL and place it in the Casuser caslib. I'll add the replace=True option to replace the CAS table if it already exists.

createTable = '''
    create table loyalty_final{options replace=True} as
    select retail.CustID,
           retail.age,
           retail.Department,
           retail.loyalty_card,
           loyalty.loyalty_value
        from casuser.RAND_RETAILDEMO as retail inner join
             casuser.LOYALTY_LOOKUP_TBL as loyalty
        on retail.loyalty_card = loyalty.loyalty_id;
'''
 
conn.execDirect(query = createTable)
 
# and the results
NOTE: Table LOYALTY_FINAL was created in caslib CASUSER(Peter) with 930046 rows returned.

The results return a note that the new table was created successfully. Once thing to know when joining and creating tables with SQL in CAS is the CAS tables are not required to be in the same caslibs. Just make sure to use the two-level naming conventions to specify the exact table you want.

Next, I'll execute the tableInfo action to view available CAS tables.

conn.tableInfo(caslib = 'casuser')


The results show that there are three CAS tables available. The original two and the newly joined CAS table LOYALTY_FINAL. I'll preview the LOYALTY_FINAL CAS table.

loyaltyInfo = conn.CASTable('LOYALTY_FINAL',caslib = 'casuser')
loyaltyInfo.head()

Summary

The SWAT package blends the world of pandas and CAS to process your distributed data. In this example I focused on using Python to execute FedSQL in the CAS server. Being able to execute SQL with Python in the CAS server gives you another set of tools when working with your distributed data.

Additional and related resources

Share

About Author

Peter Styliadis

Technical Training Consultant

Peter Styliadis is a Technical Training Consultant at SAS on the Foundations team in the Education division. The team focuses on course development and customer training. Currently Peter spends his time working with SAS Programming, Structured Query Language (SQL), SAS Visual Analytics, Programming in Viya and Python. Peter is from the Rochester/Buffalo New York area and does not miss the cold weather up north. After work, Peter can be found spending time with his wife, child and three dogs, as well as hiking and spending time at home learning something new. Prior to joining SAS, Peter worked at his family's restaurant (think My Big fat Greek Wedding), worked in customer service, then became a High School Math and Business teacher. To connect with Peter, feel free to connect on LinkedIn.

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top