Thanks to a new open source project from SAS, Python coders can now bring the power of SAS into their Python scripts. The project is SASPy, and it's available on the SAS Software GitHub. It works with SAS 9.4 and higher, and requires Python 3.x.
I spoke with Jared Dean about the SASPy project. Jared is a Principal Data Scientist at SAS and one of the lead developers on SASPy and a related project called Pipefitter. Here's a video of our conversation, which includes an interactive demo. Jared is obviously pretty excited about the whole thing.
Use SAS like a Python coder
SASPy brings a "Python-ic" sensibility to this approach for using SAS. That means that all of your access to SAS data and methods are surfaced using objects and syntax that are familiar to Python users. This includes the ability to exchange data via pandas, the ubiquitous Python data analysis framework. And even the native SAS objects are accessed in a very "pandas-like" way.
import saspy import pandas as pd sas = saspy.SASsession(cfgname='winlocal') cars = sas.sasdata("CARS","SASHELP") cars.describe() |
The output is what you expect from pandas...but with statistics that SAS users are accustomed to. PROC MEANS anyone?
In[3]: cars.describe() Out[3]: Variable Label N NMiss Median Mean StdDev \ 0 MSRP . 428 0 27635.0 32774.855140 19431.716674 1 Invoice . 428 0 25294.5 30014.700935 17642.117750 2 EngineSize . 428 0 3.0 3.196729 1.108595 3 Cylinders . 426 2 6.0 5.807512 1.558443 4 Horsepower . 428 0 210.0 215.885514 71.836032 5 MPG_City . 428 0 19.0 20.060748 5.238218 6 MPG_Highway . 428 0 26.0 26.843458 5.741201 7 Weight . 428 0 3474.5 3577.953271 758.983215 8 Wheelbase . 428 0 107.0 108.154206 8.311813 9 Length . 428 0 187.0 186.362150 14.357991 Min P25 P50 P75 Max 0 10280.0 20329.50 27635.0 39215.0 192465.0 1 9875.0 18851.00 25294.5 35732.5 173560.0 2 1.3 2.35 3.0 3.9 8.3 3 3.0 4.00 6.0 6.0 12.0 4 73.0 165.00 210.0 255.0 500.0 5 10.0 17.00 19.0 21.5 60.0 6 12.0 24.00 26.0 29.0 66.0 7 1850.0 3103.00 3474.5 3978.5 7190.0 8 89.0 103.00 107.0 112.0 144.0 9 143.0 178.00 187.0 194.0 238.0
SASPy also provides high-level Python objects for the most popular and powerful SAS procedures. These are organized by SAS product, such as SAS/STAT, SAS/ETS and so on. To explore, issue a dir() command on your SAS session object. In this example, I've created a sasstat object and I used dot<TAB> to list the available SAS analyses:
SASPy provides Python access to all of the features that your SAS license allows. The SAS Pipefitter project extends the SASPy project by providing a high-level API for building analytical pipelines. With SAS Pipefitter, you can easily create repeatable workflows that feature advanced analytics and machine learning algorithms. In our video interview, Jared presents a cool example of a decision tree applied to the passenger survival factors on the Titanic. It's powered by PROC HPSPLIT behind the scenes, but Python users don't need to know all of that "inside baseball."
Installing SASPy and getting started
Like most things Python, installing the SASPy package is simple. You can use the pip installation manager to fetch the latest version:
pip install saspy
The configuration steps will vary depending on your SAS environment. The connectivity options support an impressively diverse set of SAS configs: Windows, Unix, SAS Grid Computing, and even SAS on the mainframe! All of this is documented in the "Installation and Configuration" section of the project documentation.
If you're new to Python but well-versed in SAS, I have a recommendation for you. Two SAS and Python enthusiasts -- Isaiah Lankam and Matthew Slaughter -- have created a tutorial that shows how to use SAS (via SASPy) in Python applications. Isaiah and Matthew explain some of the Python basics and relate them to SAS concepts, then they show how to put it all together.
Download, comment, contribute
SASPy is an open source project, and all of the Python code is available for your inspection and improvement. The developers at SAS welcome you to give it a try and enter issues when you see something that needs to be improved. And if you're a hotshot Python coder, feel free to fork the project and issue a pull request with your suggested changes!
29 Comments
Pingback: Using Python to work with SAS Viya and CAS - The SAS Dummy
Awesome stuff.
Pingback: How to run SAS programs in Jupyter Notebook - The SAS Dummy
Is there a way for python to interact thru SAS Enterprise guide?
That depends what you want to do. Want to invoke a Python script? You can use the System Command task (found on this blog). What did you have in mind?
If I only have SAS Workspace Server for Local Access licensed, is it possible to use any of the other IOM configurations? Or would winlocal be the only option? I'm expecting the answer to be no, but thought I'd double check just in case =)
I think you could use winlocal OR you could use IOM with an object spawner, metadata server, and workspace all on the same machine (localhost). If you have all of that set up....
Do I need to have SAS licensed software to read SAS data files in python using saspy?
Yes, you need to have a SAS session that SASPy can connect to. This can be local SAS or a remote SAS server with a configured connection. It's also available to try for free with SAS University Edition.
How to connect to a SAS server from SASpy module? We are planning to build an user interface in python for SAS tools(to trigger stores procedures and accessing datasets for analysis). Is there any other options that I can achieve this? Note : We are accessing SAS EG thru citrix receiver.
Any help would be appreciated. Thanks!
SASPy connects to SAS Workspace servers (local or using Integration Technologies). If you need to connect to a Stored Process server, it doesn't offer a method for that.
However, if you're running a true SAS stored process, then you can surface that as a Web Service. Python offers standard methods for calling REST web services -- stored process authors and maybe a SAS admin will need to help set that up. You can also use the SAS Workspace and PROC STP to run stored processes from SASPy, and save the output data from there (convert to pandas, whatever you need). See the Stored Process Developer's Guide for more information on both of these methods.
Thanks. That helps. I'm using IOM config to connect SAS server. I'm able to trigger components and other things. Is there any way to browse the sas server directories for selecting files and datasets in saspy (for user interface)? How to get SAS server directory tree in SASpy?
Good questions. I don't know if SASPy provides access to the IOM FileService or the DataService. These are good questions to ask on the Issues board at the saspy project.
On windows could I use saspy to Choose SAS session encoding such as UTF-8 rather than default wlatin1?
Many thanks.
Yes, you would need to change this in the sascfg.py (or sascfg_personal.py) -- there is an encoding field. OR you may be able to change in your local SAS just by modifying the default sasv9.cfg in your SAS install to point to the nls/u8/sasv9.cfg file.
Pingback: [프로그래밍 팁] SAS 유니버시티 에디션으로 파이썬 코딩하기 - SAS Korea
Hello,
Is there a demo environment where this can be safely tried?
Yes - you can try this out from SAS University Edition. Free to download and use for learning.
Pingback: Coding in Python with SAS University Edition - The SAS Dummy
Pingback: Machine learning with SASPy: Exploring and preparing your data (part 1) - SAS Users
Hi, how can sas lasr analytical library be assigned using saspy in python?
Thanks
Yes, this should work as long as your SAS environment is connected to LASR. Use the submit method on the SAS Session object to submit your libname statement and any other LASR connection info you need. The example in the SASPy doc for SAS-session submit() shows an example of a Teradata libname -- similar concept.
Can SASPy access/read the SPDS datasets..?
Yes, I believe so. If you can use the SAS session submit() method to assign the SPDS library, then you can access those data members just like any data set, and bring them into Pandas dataframes. Note that SPDS data can be large -- and might be indexed to optimize for access/analytics. So if you bring these into the Python environment directly you might lose some of that benefit. Of course, you can always leave the data in SAS and run SAS code directly via submit().
Why would someone want to use SASPy versus the normal data packages for Python like Pandas and NumPy?
Hi Catherine, our customers have several reasons for this. Most commonly it's used when an org has invested in SAS to manage data and analytics, and Python developers in the same org need to access that data and analytics. SASPy connects to a SAS environment to access data and code/models that are maintained in your SAS ecosystem. It opens the door for more collaboration, removing a barrier that might otherwise be present when one group works in SAS but another works in Python.
Pingback: SAS Studio Python editor, the best of both programming worlds - The SAS Data Science Blog
show some error when connecting
The SAS Config name specified was not found. Please enter the SAS Config you wish to use. Available Configs are: ['default'] default
The OS Error was:
No such file or directory
SAS Connection failed. No connection established. Double check your settings in sascfg_personal.py file.
Attempted to run program /opt/sasinside/SASHome/SASFoundation/9.4/bin/sas_u8 with the following parameters:['/opt/sasinside/SASHome/SASFoundation/9.4/bin/sas_u8', '-nodms', '-stdio', '-terminal', '-nosyntaxcheck', '-pagesize', 'MAX', '']
If no OS Error above, try running the following command (where saspy is running) manually to see what is wrong:
/opt/sasinside/SASHome/SASFoundation/9.4/bin/sas_u8 -nodms -stdio -terminal -nosyntaxcheck -pagesize MAX
SAS Connection failed. No connection established. Double check your settings in sascfg_personal.py file.
Attempted to run program /opt/sasinside/SASHome/SASFoundation/9.4/bin/sas_u8 with the following parameters:['/opt/sasinside/SASHome/SASFoundation/9.4/bin/sas_u8', '-nodms', '-stdio', '-terminal', '-nosyntaxcheck', '-pagesize', 'MAX', '']
Try running the following command (where saspy is running) manually to see if you can get more information on what went wrong:
/opt/sasinside/SASHome/SASFoundation/9.4/bin/sas_u8 -nodms -stdio -terminal -nosyntaxcheck -pagesize MAX
No SAS process attached. SAS process has terminated unexpectedly.
Invalid response from SAS on inital submission. printing the SASLOG as diagnostic
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/tmp/ipykernel_8698/1316846176.py in
1 import saspy
2 import pandas as pd
----> 3 sas = saspy.SASsession(cfgname='winlocal')
4 cars = sas.sasdata("CARS","SASHELP")
5 cars.describe()
~/anaconda3/lib/python3.8/site-packages/saspy/sasbase.py in __init__(self, **kwargs)
573 if self.sascfg.mode != 'HTTP':
574 try:
--> 575 self.pyenc = sas_encoding_mapping[self.sascei]
576 except KeyError:
577 logger.fatal("Invalid response from SAS on inital submission. printing the SASLOG as diagnostic")
KeyError: 'No SAS process attached. SAS process has terminated unexpectedly.'
You need to edit the sascfg_personal.py file to connect to your SAS environment. Full documentation and examples are in the SASPy documentation. If you cannot get it working, I suggest you enter an issue on the GitHub page.