Welcome to the seventh installment in my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS server, how to execute CAS actions, and how to work with the results. Now it's time to learn how to filter CAS tables.
Load and explore data
First, I'll load the cars.csv file into memory from the sassfotware GitHub page. I'll start by using my CAS connection object conn, followed by the upload_file method from the SWAT package. The upload_file method loads a client-side file into memory on the CAS server and returns a reference to the CAS table. The data in this example is small, for training purposes. Processing data in the CAS server is typically reserved for larger data.
tbl = conn.upload_file('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv', casout={'name':'cars', 'caslib':'casuser'}) |
The response from the command is minimal, but there's much going on. Specifically:
- Cloud Analytic Services renders the uploaded file available as table CARS in caslib CASUSER(Peter).
- The table CARS is created in caslib CASUSER(Peter) from binary data uploaded to Cloud Analytic Services.
Next, I'll view the type and value of the tbl variable to confirm it's a CASTable object that references the CARS table in the CAS server.
display(type(tbl), tbl) # and the command results: swat.cas.table.CASTable CASTable('CARS', caslib='CASUSER(Peter)') |
The results show tbl is a CASTable object and references the CARS table in the CAS server. Now, when I execute methods on the tbl object, the processing occurs in CAS.
Further, let's preview the CAS table using the SWAT package head method on the tbl object.
tbl.head() |
The CAS server returns 5 rows from the CAS table to the Python client as a DataFrame as expected.
Finally, I'll execute the SWAT shape attribute to view the number of rows and columns in the CAS table.
tbl.shape # and the results (428, 15) |
Notice, the CARS CAS table contains 428 rows and 15 columns.
Filter a CAS Table
Now that we have created the table in the caslib and are familiar with the data, let's learn how to filter CAS tables!
Python Technique
First, I'll start with the traditional Pandas filtering technique. Since the SWAT package tries to blend the world of Pandas and SAS, this is a good starting point. This will filter the CAS table for rows where the Make of the car is Acura or Toyota,
makeFilter = (tbl['Make']=='Acura') | (tbl['Make']=='Toyota') tbl[makeFilter].shape # and the results (35, 15) |
The results show that 35 rows in the CARS CAS table have the Make Toyota or Acura.
Query Method
You can also filter a CAS table using the query method in the SWAT package. Wait a minute, does this look familiar? I mentioned SWAT blends the world of Pandas as SAS. I'll again filter for a Make of Toyota or Acura .
tbl.query("Make = 'Acura' or Make = 'Toyota'").shape # and the results (35, 15) |
Notice, the results are the same.
isin Method
All right. Sorry, but again I'll show you another familiar method from the SWAT package. Since I am filtering for specific categories, I can also use the SWAT isin method, which works similarly to the Pandas isin method!
tbl[tbl.Make.isin(['Acura', 'Toyota'])].shape # and the results (35, 15) |
Notice, again the results are the same.
CAS Table WHERE Parameter
Lastly, I'll show you a specific CAS technique, which is very efficient if you need to filter data for multiple methods or actions. To begin, let's display the type and value of the tbl object again.
display(type(tbl), tbl) # and the results swat.cas.table.CASTable CASTable('CARS', caslib='CASUSER(Peter)') |
Notice the tbl variable is a CASTable object, which is a reference to a CAS table in the CAS server, not an actual table. In this example, it references the CARS table in the Casuser caslib.
The CASTable object enables you to add a variety of parameters to the object to filter, create columns, group and more. In this example, I'll add the where parameter to the CASTable object to filter for car makes of Toyota and Acura. Then I'll display the value of the CASTable object again.
tbl.where = 'Make = "Acura" or Make = "Toyota"' display(tbl) |
CASTable('CARS', caslib='CASUSER(Peter)', where='Make = "Acura" or Make = "Toyota"')
Notice the CASTable still references the CARS table in the Casuser caslib, but the where parameter has been added to the object. The where parameter is applied when the object is used with an action or method.
Let's check it out. Here I'll specify the tbl variable (the CASTable object) then the shape attribute.
tbl.shape # and the results (35, 15) |
Notice the results again show 35 cars are either Toyota or Acura.
Now, what is nice about using this method is if I want to continue to use this filter with other methods or actions, all I need to do is use the tbl object. Perhaps you want the value counts of each Make? The SWAT package has the value_counts method available.
tbl.Make.value_counts() # and the results Toyota 28 Acura 7 |
The response displays the counts only for rows where Make is Acura or Toyota. This occurs because the where parameter is applied to the CAS table when the value_counts method is executed.
Delete the WHERE parameter
Lastly, if you want to delete the where parameter from the CASTable object, use the del_params method.
tbl.del_params('where') display(tbl) # and the results CASTable('CARS', caslib='CASUSER(Peter)') |
Once the parameter is deleted you can revert back to analyzing the entire table. Here I'll use the shape attribute again to view how many rows are in the original CAS table.
tbl.shape # and the results (428, 15) |
Summary
SAS Viya offers various filtering options for CAS tables. The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas techniques. You also have specific CAS techniques handy when working with the CASTable object directly. Choosing the correct method may depend on your greater code structure, but you now have the right tools to make that decision.