Using Python to run jobs in your SAS Grid

0

Recent updates to SAS Grid Manager introduced many interesting new features, including the ability to handle more diverse workloads. In this post, we'll take a look at the steps required to get your SAS Grid Manager environment set up to accept jobs from outside of traditional SAS clients. We'll demonstrate the process of submitting Python code for execution in the SAS Grid.

Preparing your SAS Grid

Obviously, we need a SAS Grid Manager (SAS 9.4 Maintenance 6 or later) environment to be installed and configured. Once the grid is deployed, there's not a whole lot more to do on the SAS side. SAS Workspace Servers need to be configured for launching on the grid by converting them to load balanced Workspace Servers, changing the algorithm to 'Grid', and then selecting the relevant checkbox in SASApp – Logical Workspace Server properties to launch on the grid as shown below.

The only other things that might need configuring are, if applicable, .authinfo files and Grid Option Sets. Keep reading for more information on these.

Preparing your client machine

In this example scenario, the client is the Windows desktop machine where I will write my Python code. SAS is not installed here. Rather, we will deploy and use Jupyter Notebook as our IDE to write Python code, which we'll then submit to the SAS Grid. We can get Jupyter Notebook by installing Anaconda, which is a free, bundled distribution of Python and R that comes with a solid package management system. (The following steps are courtesy of my colleague, Allan Tham.)

First, we need to download and install Anaconda.

Once deployed, we can open Anaconda Navigator from the Start menu and from there, we can launch Jupyter Notebook and create a notebook.

Now it's time to configure the connection from Python to our SAS environment. To do this, we use the SAS-developed open-source SASPy module for Python, which is responsible for converting Python code to SAS code and running it in SAS. Installing SASPy is simple. First, we need to download and install a Java Runtime Environment as a prerequisite. After that, we can launch Anaconda Prompt (a command line interface) from the Start menu and use pip to install the SASPy package.

pip install saspy

Connection parameters are defined in a file named sascfg.py. In fact, the best practice is to use sascfg.py as a template, and create a copy of it, which should be named sascfg_personal.py. SASPy will read and look for connections in both files. For the actual connection parameters, we need to specify the connection method. This depends largely on your topology.

In my environment, I used a Windows client to connect to a Linux server using a standard IOM connection. The most appropriate SASPy configuration definition is therefore 'winiomlinux', which relies on a Java-based connection method to talk to the SAS workspace. This needs to be defined in the sascfg_personal.py file.

SAS_config_names=['winiomlinux']

We also need to specify the parameters for this connection definition as shown below.

# build out a local classpath variable to use below for Windows clients   CHANGE THE PATHS TO BE CORRECT FOR YOUR INSTALLATION 
cpW  =  "C:\\ProgramData\\Anaconda3\\Lib\\site-packages\\saspy\\java\\sas.svc.connection.jar"
cpW += ";C:\\ProgramData\\Anaconda3\\Lib\\site-packages\\saspy\\java\\log4j.jar"
cpW += ";C:\\ProgramData\\Anaconda3\\Lib\\site-packages\\saspy\\java\\sas.security.sspi.jar"
cpW += ";C:\\ProgramData\\Anaconda3\\Lib\\site-packages\\saspy\\java\\sas.core.jar"
cpW += ";C:\\ProgramData\\Anaconda3\\Lib\\site-packages\\saspy\\java\\saspyiom.jar"

Note that we are referencing a number of JAR files in the classpath that normally come with a SAS deployment. Since we don't have SAS on the client machine, we can copy these from the Linux SAS server. In my lab, these were copied from /SASDeploymentManager/9.4/products/deploywiz__94526__prt__xx__sp0__1/deploywiz/.

Further down in the file, we can see the connection parameters to the SAS environment. Update the path to the Java executable, the SAS compute server host, and the port for the Workspace Server.

winiomlinux = {'java'   : 'C:\\Program Files\\Java\\jre1.8.0_221\\bin\\java',
            'iomhost'   : 'sasserver01.mydemo.sas.com',
            'iomport'   : 8591,
            'encoding'  : 'latin1',
            'classpath' : cpW
            }

Note: thanks to a recent contribution from the community, SASPy now also supports a native Windows connection method that doesn't require Java. Instead, it uses the SAS Integration Technologies client -- a component that is free to download from SAS. You'll already have this component if you've installed SAS Enterprise Guide, SAS Add-In for Microsoft Office, or Base SAS for Windows on your client machine. See instructions for this configuration in the SASPy documentation.

With the configuration set, we can import the SASPy module in our Jupyter Notebook.

Although we're importing sascfg_personal.py explicitly here, we can just call sascfg.py, as SASPy actually checks for sascfg_personal.py and uses that if it finds it. If not, it uses the default sascfg.py.

To connect to the grid from Jupyter Notebook using SASPy, we need to instantiate a SASsession object and specify the configuration definition (e.g. winiomlinux). You'll get prompted for credentials, and then a message indicating a successful connection will be displayed.

It's worth nothing that we could also specify the configuration file to use when we start a session here by specifying:

sas = saspy.SASsession(cfgfile='/sascfg_personal.py')

Behind the scenes, SAS will find an appropriate grid node on which to launch a Workspace Server session. The SASPy module works in any SAS 9.4 environment, regardless of whether it is a grid environment or not. It simply runs in a Workspace Server; in a SAS Grid environment, it just happens to be a Grid-launched Workspace Server.

Running in the Grid

Now let's execute some code in the Workspace Server session we have launched. Note that not all Python methods are supported by SASPy, but because the module is open source, anyone can add to or update the available methods. Refer to the API doc for more information.

In taking another example from Allan, let's view some of the content in a SASHELP data set.

The output is in ODS, which is converted (by SASPy) to a Pandas data frame to display it in Jupyter Notebook.

Monitoring workloads from SAS Grid Manager

SAS Workload Orchestrator Web interface allows us to view the IOM connections SASPy established from our client machine. We can see the grid node on which the Workspace Server was launched, and view some basic information. The job will remain in a RUNNING state until the connection is terminated (i.e. the SASsession is closed by calling the endsas() method. By the same token, creating multiple SASsessions will result in multiple grid jobs running.

To see more details about what actually runs, Workspace Server logs need to first be enabled. When we run myclass.head(), which will display the first 5 rows of the data set, we see the following written to Workspace Server log.

2019-08-14T02:47:34,713 INFO  [00000011] :sasdemo - 239        ods listing close;ods html5 (id=saspy_internal) file=_tomods1 options(bitmap_mode='inline') device=svg style=HTMLBlue;
2019-08-14T02:47:34,713 INFO  [00000011] :sasdemo - 239      ! ods graphics on / outputfmt=png;
2019-08-14T02:47:34,779 INFO  [00000011] :sasdemo - NOTE: Writing HTML5(SASPY_INTERNAL) Body file: _TOMODS1
2019-08-14T02:47:34,780 INFO  [00000011] :sasdemo - 240        ;*';*";*/;
2019-08-14T02:47:34,782 INFO  [00000045] :sasdemo - 241        data _head ; set sashelp.prdsale (obs=5 ); run;
2019-08-14T02:47:34,782 INFO  [00000045] :sasdemo -
2019-08-14T02:47:34,783 INFO  [00000045] :sasdemo - NOTE: There were 5 observations read from the data set SASHELP.PRDSALE.
2019-08-14T02:47:34,784 INFO  [00000045] :sasdemo - NOTE: The data set WORK._HEAD has 5 observations and 5 variables.
2019-08-14T02:47:34,787 INFO  [00000045] :sasdemo - NOTE: DATA statement used (Total process time):
2019-08-14T02:47:34,787 INFO  [00000045] :sasdemo -       real time           0.00 seconds
2019-08-14T02:47:34,787 INFO  [00000045] :sasdemo -       cpu time            0.00 seconds
2019-08-14T02:47:34,787 INFO  [00000045] :sasdemo -

We can see the converted code, which in this case was a data step which creates a work table based on PRDSALE table with obs=5. Scrolling down, we also see the code that prints the output to ODS and converts it to a data frame.

Additional Options

Authinfo files

The sascfg_personal.py file has an option for specifying an authkey, which is an identifier that maps to a set of credentials in an .authinfo file (or _authinfo on Windows). This can be leveraged to eliminate the prompting for credentials. For example, if your authinfo file looks like:

IOM_GELGrid_SASDemo user sasdemo password lnxsas

your configuration defintion in your sascfg_personal.py should look like:

winiomlinux = {'java'   : 'C:\\Program Files\\Java\\jre1.8.0_221\\bin\\java',
            'iomhost'   : 'sasserver01.mydemo.sas.com',
            'iomport'   : 8591,
            'encoding'  : 'latin1',
            'authkey' : 'IOM_GELGrid_SASDemo'
            'classpath' : cpW
            }

There are special rules for how to secure the authinfo file (making it readable only by you), so be sure to refer to the instructions.

Grid Option Sets

What if you want your code to run in the grid with certain parameters or options by default? For instance, say you want all Python code to be executed in a particular grid queue. SASPy can do this by leveraging Grid Option Sets. The process is outlined here, but in short, a new SASPy 'SAS Application' has to be configured in SAS Management Console, which is then used to the map to the Grid Options Set (created using the standard process).

More Information

My sincere thanks to Allan Tham and Greg Wootton for their valued contributions to this post.

Please refer to the official SAS Grid documentation for more information on SAS Grid Manager in SAS 9.4M6.

Thank you for reading. I hope the information provided in this post has been helpful. Please feel free to comment below to share your own experiences.

Share

About Author

Ajmal Farzam

Senior Technical Architect

Ajmal Farzam has over 10 years of experience implementing SAS® solutions globally to clients in a broad range of industries. He has been at SAS® since 2010, working in various roles in Professional Services, Technical Support and R&D. His areas of focus are platform administration, deployment and architecture. Ajmal is currently a Senior Technical Architect in the Global Enablement and Learning practice.

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top