Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In this post I'll discuss how to execute SQL with the Python SWAT package in the distributed CAS server.
Prepare and load data to the CAS server
I created a Python function named createDemoData to prepare and load data to the CAS server. The function takes your CAS connection object as the parameter. The function will :
- load the RAND_RETAILDEMO.sashdat file from the Samples caslib into memory in the Casuser caslib. This is the main table.
- create a DataFrame on the Python client and load it to the CAS server as the table LOYALTY_LOOKUP_TBL in the Casuser caslib. This is the lookup table.
- run the tableInfo CAS action to confirm the CAS tables RAND_RETAILDEMO and LOYALTY_LOOKUP_TBL are in-memory in the Casuser caslib.
## Import packages import swat import os import pandas as pd import numpy as np conn = ## Specify your CAS connection information def createDemoData(connectionObject): '''Specify your CAS connectin object as the parameter''' connectionObject.loadTable(path = 'RAND_RETAILDEMO.sashdat', caslib = 'samples', casout = { 'caslib':'casuser', 'replace':True }) myData = { 'loyalty_id':[0,1], 'loyalty_value':['No','Yes'], } df = pd.DataFrame(myData) connectionObject.upload_frame(df, casout = { 'name':'loyalty_lookup_tbl', 'caslib':'casuser', 'replace':True }) tableInfo = connectionObject.tableInfo(caslib = 'casuser') return tableInfo ## Execute function using CAS connection object createDemoData(conn) # and the results NOTE: Cloud Analytic Services made the file RAND_RETAILDEMO.sashdat available as table RAND_RETAILDEMO in caslib CASUSER(Peter). NOTE: Cloud Analytic Services made the uploaded file available as table LOYALTY_LOOKUP_TBL in caslib CASUSER(Peter). NOTE: The table LOYALTY_LOOKUP_TBL has been created in caslib CASUSER(Peter) from binary data uploaded to Cloud Analytic Services. |
Load the FedSQL CAS action set
To use SQL with the Python SWAT package in the CAS server you will need to load the FedSQL CAS action set.
conn.loadActionSet('fedSQL') |
To view more information about the FedSQL action set you can run the help method.
conn.help(actionset='fedSQL') |
The FedSQL action set contains a single action named execDirect. We can use this action to submit SQL queries to the distributed CAS server.
Perform a simple query
Let's preview our CAS tables using a simple query. First I'll view 5 rows from the RAND_RETAILDEMO CAS table from the Casuser caslib. I'll store my query as a string in the variable previewRetail. Within the string I'll add the SELECT, FROM and LIMIT clauses. In the FROM clause I'll use a two-level naming convention to specify the table. This specifies the caslib and CAS table name. Lastly, execute the execDirect action and add the query parameter with the value previewRetail (the query as a string).
previewRetail = ''' select * from casuser.RAND_RETAILDEMO limit 5; ''' conn.execDirect(query = previewRetail) |
The query returns five rows from the RAND_RETAILDEMO CAS table from the Casuser caslib. Notice the loyalty_card column. It contains a 0 and 1. We will use that column as our join criteria to obtain the loyalty values later in the post.
I'll do the same thing to preview the LOYALTY_LOOKUP_TBL CAS table from the Casuser caslib.
previewLoyalty = ''' select * from casuser.LOYALTY_LOOKUP_TBL limit 5; ''' conn.execDirect(query = previewLoyalty) |
This query returns two rows, one for each loyalty_id. We will use this column to join with the loyalty_card column from the previous table to obtain the loyalty_value.
Perform a join
Let's perform a inner join using the following:
- the RAND_RETAILDEMO CAS table from the Casuser caslib using the loyalty_card column
- the LOYALTY_LOOKUP_TBL CAS table from the Casuser caslib using the loyalty_id column
- keep the CustID, age, Department, loyalty_card and loyalty_value columns
- limit the join to five rows to confirm it works as expected
joinData = ''' select retail.CustID, retail.age, retail.Department, retail.loyalty_card, loyalty.loyalty_value from casuser.RAND_RETAILDEMO as retail inner join casuser.LOYALTY_LOOKUP_TBL as loyalty on retail.loyalty_card = loyalty.loyalty_id limit 5; ''' conn.execDirect(query = joinData) |
The results show that the join was successful and returns a CASResults object to the Python client.
Create a new CAS table using the join
Now that we know the join works, let's join the full table and create a new CAS table with the results. I'll create a new CAS table named LOYALTY_FINAL and place it in the Casuser caslib. I'll add the replace=True option to replace the CAS table if it already exists.
createTable = ''' create table loyalty_final{options replace=True} as select retail.CustID, retail.age, retail.Department, retail.loyalty_card, loyalty.loyalty_value from casuser.RAND_RETAILDEMO as retail inner join casuser.LOYALTY_LOOKUP_TBL as loyalty on retail.loyalty_card = loyalty.loyalty_id; ''' conn.execDirect(query = createTable) # and the results NOTE: Table LOYALTY_FINAL was created in caslib CASUSER(Peter) with 930046 rows returned. |
The results return a note that the new table was created successfully. Once thing to know when joining and creating tables with SQL in CAS is the CAS tables are not required to be in the same caslibs. Just make sure to use the two-level naming conventions to specify the exact table you want.
Next, I'll execute the tableInfo action to view available CAS tables.
conn.tableInfo(caslib = 'casuser') |
The results show that there are three CAS tables available. The original two and the newly joined CAS table LOYALTY_FINAL. I'll preview the LOYALTY_FINAL CAS table.
loyaltyInfo = conn.CASTable('LOYALTY_FINAL',caslib = 'casuser') loyaltyInfo.head() |
Summary
The SWAT package blends the world of pandas and CAS to process your distributed data. In this example I focused on using Python to execute FedSQL in the CAS server. Being able to execute SQL with Python in the CAS server gives you another set of tools when working with your distributed data.