Welcome to the first post for the Getting Started with Python Integration to SAS Viya series! With the popularity of the Python programming language for data analysis and SAS Viya's ability to integrate with Python, I thought, why not create tutorials for users integrating the two?
To begin the series I want to talk about the most important step, making a connection to SAS Viya through your favorite Python client. In the examples I will use a Jupyter notebook, but the method would be the same on any Python client interface. Before I begin diving into code and connections, I want to provide a brief, high level overview of SAS Viya.
What is SAS Viya?
SAS Viya extends the SAS Platform, operates in the cloud (as well as in hybrid and on-prem solutions) and is open source-friendly. For better performance, SAS Viya operates on in-memory data, removing the read/write data transfer overhead. Data processing and analytic procedures use the SAS Cloud Analytic Services (CAS), the engine behind SAS Viya. Further, it enables everyone in an organization to collaborate and work with data by providing a variety of products & solutions running in CAS.
What exactly is CAS? Let's consider the image below.
CAS distributes heavy workloads among multiple computing instances for fast and efficient processing. The environment consists of a controller and a set of worker nodes allowing data storage and processing. Let's take a simple example. A table with 300GB of data is uploaded to the CAS environment. The controller parses out 100GB chunks of the data to each of the worker nodes. The data is loaded into memory on each of the worker nodes, and each node processes their 100GBs of data.
Additionally, CAS uses modern, dynamic algorithms to rapidly perform analytical processing on data of any size. Throughout the series I will refer to the CAS distributed environment as the CAS server, or simply CAS.
For more information about the Cloud Analytic Services architecture visit SAS® Cloud Analytic Services: Fundamentals.
Furthermore, SAS Viya is open. Business analysts and data scientists can explore, prepare and manage data to provide insights, create visualizations or analytical models using the SAS programming language or a variety of open source languages like Python, R, Lua, or Java. Because of this, programmers can easily process data in CAS, using a language of their choice.
Now that the high level overview of SAS Viya is out of the way, let's discuss the why.
Why do I want to integrate Python to SAS Viya?
When working with data on your local computer you are typically constrained to your computer's resources. For example, when you are working with smaller data (generally think around 1GB) you most likely will not have any resource issues. Alternatively, what if your data is 100GB? A terabyte? What do you do then?
The solution is simple; integrate Python to SAS Viya! At the highest level, the SAS Viya architecture is meant to work with large data your client machine cannot handle. You can load your large data into CAS which distributes chunks of the data to each of the worker nodes. The data is loaded into memory on the worker nodes and processed using Python. Sounds great right? I haven't even told you the best part.
Many of the familiar Pandas methods are available through the Scripting Wrapper for Analytics Transfer (SWAT) package. The SWAT package provides functionality and syntax having the feel of open source code, but simply wraps up CAS actions to send to the server. CAS actions are small units of work the CAS server understands. They load and transform data, compute statistics, perform analytics and create output.
Compare the simple code samples below. You see the commands written in SAS, Python, or R native code. When sent from a client to CAS, SWAT translates the command, performs the action on the CAS server, and returns a response to the client.
Another great feature is the ability to transfer summarized data from the CAS server back to the client machine. Having data on the client machine allows you to use any familiar Python packages like Pandas, Matplotlib, Seaborn, scikit-learn and many more!
So how do you get started?
Connecting to the CAS server
To connect to the CAS server, complete the following two steps:
Install the SWAT package
First, we need to install the SWAT package. The SWAT package provides a means for Python users to submit code to the CAS server. The SWAT package translates the Python code to CAS actions the server understands. Install the SWAT package using the pip command as follows:
pip install swat |
Or, if you are using Anaconda:
conda install -c sas-institute swat |
For more information on the installing the SWAT package for a specific platform and version visit sassoftware/python-swat GitHub page or the documentation.
Make a connection to the CAS server
Next, it's time to make a connection from your local client to the CAS server. This step is required and there are some variations in how this is implemented. I have prepared two methods to connect to CAS. The first method employs a username and password. The second uses token authentication with the use of environment variables. Explore other authentication methods from links provided in the Additional Resources section at the end of this post.
Connecting to CAS using a username and password
First we'll import the SWAT package.
import swat |
Next we'll use the swat.CAS constructor to create a connection object to the CAS server. I name this new connection object conn. Feel free to name the object whatever you would like. I've seen it named s in some documentation. I prefer conn since it's my connection to CAS.
The CAS constructor requires the host name and the listening port of the CAS controller. We also need to authenticate -- that is, we need to tell CAS who we are. In this example, we use a username and password. If you do not know any of this connection information, speak with your administrator.
conn = swat.CAS(hostname="http://server.demo.sas.com/cas-shared-default-http/", port=8777, username="student", password="Metadata0") |
Let's quickly investigate the output of our conn object.
display(conn) CAS('server.demo.sas.com', 8777, 'student', protocol='http', name='py-session-1', session='c8091979--483f-8cde-3c97a1f372a3') |
We are now connected! Notice the conn object holds all of our connection information. We will use this connection object going forward.
Finally, let's look at the new object's type.
type(conn) swat.cas.connection.CAS |
Alert! I hope you noticed something. You should be thinking, "did we just type our authentication information in plain text?". The answer is yes. However, when you are using password authentication, you should NEVER type your information in plain text. In this example I am using a training machine and I want to demonstrate the simplest connection procedure. For real world scenarios, there are options like setting Python environment variables, creating an authinfo file, or creating a hashed version. My recommendation is to follow your company policy for authentication.
Great! Everything looks ready and we are connected to CAS! Let's now look at a second method of connecting.
Connecting to CAS using token authentication and environment variables
In this example, I am using SAS Viya for Learners. SAS Viya for Learners is a SAS Viya implementation created for educators and their students. To access a Jupyter notebook when using SAS Viya for Learners, log in using your username and password and launch the application. Once in the application, look at the bottom of the screen in the orange box and select the Jupyter notebook icon, as seen below.
After Jupyter opens, select the Python kernel and begin.
Now it's time to make the connection to CAS. First, I'm going to import the SWAT and os packages. The os package allows me to obtain environment variable values necessary to connect to CAS.
import swat import os |
Next, I want to obtain the values of a few environment variables. SAS Viya for Learners comes configured with the needed environment variables. To start, I'm going to use the os.environ.get method to obtain the CASHOST, CASPORT and SAS_VIYA_TOKEN values and set them equal to new variables.
hostValue = os.environ.get('CASHOST') portValue = os.environ.get('CASPORT') passwordToken = os.environ.get('SAS_VIYA_TOKEN') |
The variables hostValue, portValue and passwordToken all contain the necessary values to connect to CAS. Let's use the swat.CAS constructor again with the newly created variables.
conn = swat.CAS(hostname=hostValue, port=portValue, password=passowrdToken) |
Let's view the output of the conn object.
display(conn) CAS('svflhost.demo.sas.com', 5570, 'joe.test@sas.com', protocol='cas', name='py-session-1', session='efff4323-a862-bd6e-beea737b4249') |
That's it. We're connected!
Summary
In summary, the most important thing to know is that there are multiple ways to connect to the CAS server. For additional approaches, see links below and also Joe Furbee's blog post, Authentication to SAS Viya: a couple of approaches. If you're not sure of the proper method, I recommend you discuss how to connect to CAS with the administrator of your environment.
That was just the start of the series, but it's the most important step! You can't do anything else, unless you first authenticate and connect. Once you are connected to the CAS server the next question should be, what do I do now? In the next post we will talk about sending commands to the CAS server in Working with CAS Actions and CASResults Objects.