Thanks to a new open source project from SAS, Python coders can now bring the power of SAS into their Python scripts. The project is SASPy, and it's available on the SAS Software GitHub. It works with SAS 9.4 and higher, and requires Python 3.x.
I spoke with Jared Dean about the SASPy project. Jared is a Principal Data Scientist at SAS and one of the lead developers on SASPy and a related project called Pipefitter. Here's a video of our conversation, which includes an interactive demo. Jared is obviously pretty excited about the whole thing.
Use SAS like a Python coder
SASPy brings a "Python-ic" sensibility to this approach for using SAS. That means that all of your access to SAS data and methods are surfaced using objects and syntax that are familiar to Python users. This includes the ability to exchange data via pandas, the ubiquitous Python data analysis framework. And even the native SAS objects are accessed in a very "pandas-like" way.
import saspy import pandas as pd sas = saspy.SASsession(cfgname='winlocal') cars = sas.sasdata("CARS","SASHELP") cars.describe()
The output is what you expect from pandas...but with statistics that SAS users are accustomed to. PROC MEANS anyone?
In: cars.describe() Out: Variable Label N NMiss Median Mean StdDev \ 0 MSRP . 428 0 27635.0 32774.855140 19431.716674 1 Invoice . 428 0 25294.5 30014.700935 17642.117750 2 EngineSize . 428 0 3.0 3.196729 1.108595 3 Cylinders . 426 2 6.0 5.807512 1.558443 4 Horsepower . 428 0 210.0 215.885514 71.836032 5 MPG_City . 428 0 19.0 20.060748 5.238218 6 MPG_Highway . 428 0 26.0 26.843458 5.741201 7 Weight . 428 0 3474.5 3577.953271 758.983215 8 Wheelbase . 428 0 107.0 108.154206 8.311813 9 Length . 428 0 187.0 186.362150 14.357991 Min P25 P50 P75 Max 0 10280.0 20329.50 27635.0 39215.0 192465.0 1 9875.0 18851.00 25294.5 35732.5 173560.0 2 1.3 2.35 3.0 3.9 8.3 3 3.0 4.00 6.0 6.0 12.0 4 73.0 165.00 210.0 255.0 500.0 5 10.0 17.00 19.0 21.5 60.0 6 12.0 24.00 26.0 29.0 66.0 7 1850.0 3103.00 3474.5 3978.5 7190.0 8 89.0 103.00 107.0 112.0 144.0 9 143.0 178.00 187.0 194.0 238.0
SASPy also provides high-level Python objects for the most popular and powerful SAS procedures. These are organized by SAS product, such as SAS/STAT, SAS/ETS and so on. To explore, issue a dir() command on your SAS session object. In this example, I've created a sasstat object and I used dot<TAB> to list the available SAS analyses:
SASPy provides Python access to all of the features that your SAS license allows. The SAS Pipefitter project extends the SASPy project by providing a high-level API for building analytical pipelines. With SAS Pipefitter, you can easily create repeatable workflows that feature advanced analytics and machine learning algorithms. In our video interview, Jared presents a cool example of a decision tree applied to the passenger survival factors on the Titanic. It's powered by PROC HPSPLIT behind the scenes, but Python users don't need to know all of that "inside baseball."
Installing SASPy and getting started
Like most things Python, installing the SASPy package is simple. You can use the pip installation manager to fetch the latest version:
pip install saspy
However, since you need to connect to a SAS session to get to the SAS goodness, you will need some additional files to broker that connection. Most notably, you need a few Java jar files that SAS provides. You can find these in the SAS Deployment Manager folder for your SAS installation:
../deploywiz/sas.svc.connection.jar ../deploywiz/log4j.jar ../deploywiz/sas.security.sspi.jar ../deploywiz/sas.core.jar
The jar files are compatible between Windows and Unix, so if you find them in a Unix SAS install you can still copy them to your Python Windows client. You'll need to modify the sascgf.py file (installed with the SASPy package) to point to where you've stashed these. If using local SAS on Windows, you also need to make sure that the sspiauth.dll is in your Windows system PATH. The easiest method to add SASHOME\SASFoundation\9.4\core\sasexe to your system PATH variable.
All of this is documented in the "Installation and Configuration" section of the project documentation. The connectivity options support an impressively diverse set of SAS configs: Windows, Unix, SAS Grid Computing, and even SAS on the mainframe!
Download, comment, contribute
SASPy is an open source project, and all of the Python code is available for your inspection and improvement. The developers at SAS welcome you to give it a try and enter issues when you see something that needs to be improved. And if you're a hotshot Python coder, feel free to fork the project and issue a pull request with your suggested changes!