By Sundaresh Sankaran & Ali Aiello, Global Technology Practice, SAS Institute
Many analytics developers prefer Python for its ease of use, versatility and community support. Acknowledging this from an early stage of Viya’s evolution, SAS has progressively provided access to Python from within its applications, the latest being SAS Studio. The Proc Python procedure, Python code editor & Python code step facilitate low-code analytics calling Python and SAS from a common interface. Data scientists also appreciate the connection to Python & R through the Model Studio Open-source Code node. Older methods of interaction include the swat and sas_kernel packages running on Python clients.
You’d think that’s enough to keep these guys happy! Yet, the analytics user community continues to request more ease of access, flexibility and customization. Here are some statements we have heard from customers:
- “I’d like to quickly install a new Python or R package for testing purposes”.
- “My custom Python solution has hard dependencies on specific packages and versions”.
- “I want to try out a package for a limited period without breaking existing code”.
- “I want to install my packages, not wait for an administrator.”
- “I'd like to access a customized environment on demand, with specialized packages”.
- “I have different package needs when using a Python notebook to connect to SAS Viya, contrasted with using Python from within SAS Viya”.
- “I want to be able to easily port my solution to a colleague, with minimal additional instructions.”
Users have sophisticated expectations and requirements for the interaction of SAS and Python. These indicate a desire to retain current practices and adopt a uniform approach, regardless of where Python is accessed from. While SAS has long promoted its ability to integrate with Python and other open-source languages, this extent of customization has not always been possible in Viya. However, with recent releases, we can now accomplish these asks and more. This allows a larger universe of users to participate in flexible, customizable and easy analytic development on a common unified platform.
How can Python users leverage multiple environments?
The end-user experience can be delivered using two approaches. The first is suitable for cases where administrators wish to curate and control additional environments, based on user requirements. As shown in the short video below, multiple Python environments are built and made available on the SAS platform, catering to different versions, personas and package requirements. Users switch between environments they have access to, using compute contexts created for this purpose. Each compute context is tied to a specified Python environment. Updates to this Python environment, or even adding a new environment (and compute context), can be scripted and executed in a non-intrusive manner, providing minimal user disruption.
The second approach is more liberal and provides an even more enjoyable experience, given that users want the flexibility to install packages of choice. Note the analytical process flow (called, rather helpfully, a Flow) in the next video, where users start off by defining a requirements file specifying packages they wish to use. A custom component (called a Step) creates a virtual Python environment (making use of a virtualization package) and installs everything necessary. Not only can users reuse previously defined environments by persisting them in a location for later activation, they can also spin down the virtual environment once it has served its purpose. The requirements file can be ported along with the analytical solution so that other colleagues can use the same package requirements, with no need to pre-install packages in their Python environments.
How does SAS typically integrate with Python? Let’s look at the underlying architecture through two modes – “Python outside” and “Python inside”. “Python outside” has been around since SAS Viya provided access to Python. It makes a reasonable assumption that many organizations already have Python in place and SAS Viya is new to the party. Packages such as swat and the SAS kernel facilitate connection to SAS servers, with optional use of Python clients such as Jupyter notebook.
Recently there is a greater desire to call Python from within SAS applications – “Python inside”. In a containerized environment, the magic that users see on their applications is enabled via the configuration of certain paths at the backend, which defines ‘where’ a Python environment may be found. This configuration is carried out through a deployment asset called open-source config transformers, which make sure that ephemeral, short-lived resources which govern a SAS session provide access to your Python of choice.
Here comes the bonus – even more recently, a new deployment asset called the SAS Configurator for Open-source (or its technical name – sas-pyconfig) also allows definition & specifying a Python environment through a Kubernetes job, right at the deployment stage! Make that plural – multiple environments! Known as profiles, multiple Python environments, each built with different versions and a different set of packages, cater to the requirements of different user personas or projects.
How do administrators build multiple environments?
We use the SAS Configurator for Open Source, which was released in December 2021 as a utility to simplify the “download, configuration, build and installation of Python from source”. As referred to in the architecture diagram, one or more Python environments are built-in Persistent Volumes, accessible through a Persistent Volume Claim which always has a standard name pattern (sas-pyconfig) and a location pattern such as /opt/sas/viya/home/sas-pyconfig. Therefore, the Python executable for an environment with the name of “default” can be accessed within a pod at /opt/sas/viya/home/sas-pyconfig/default/bin/python3. Similarly, an additional environment called “datascientist” can be accessed from /opt/sas/viya/home/sas-pyconfig/datascientist/bin/python3.
Once we have installed Python, it’s time to make SAS applications aware of where Python is located. This is accomplished by creating a volume mount to the PV through a set of transformers called open-source-config. These transformers have been around for some time and can be used even in cases where organizations choose to use an existing Python environment or set one up separately.
Upon successful configuration of the above transformers, the following variables (known as PATH variables or macro variables) are assigned with values that are passed on to downstream SAS application sessions that are initiated. For example, the variable PROC_PYPATH contains the location of the Python installation referred to above and is used when SAS Studio sessions are initiated. Similarly, the variable DM_PYPATH will contain a path to Python which is used within the Open-Source code node. Following the example provided earlier, the data scientist profile would tend to be suitable for DM_PYPATH, thus providing us the following two variables:
configMapGenerator: - name: sas-open-source-config-python literals: - PROC_PYPATH=/opt/sas/viya/home/sas-pyconfig/default/bin/python3 … - DM_PYPATH=/opt/sas/viya/home/sas-pyconfig/datascientist/bin/python3
To simplify the process from the user’s perspective, the paths above are wrapped in different Compute Contexts. A SAS Viya compute context simply refers to information passed over to a SAS launcher service, which then starts a requested session using such information. One can use applications such as SAS Environment Manager or scripts calling REST APIs to create a compute context. Administration utilities such as the sas-viya Command Line Interface also serve a useful purpose in automating the creation of a new compute context. The process is quick and leads to a new compute context appearing whenever you click on the context icon in SAS Studio.
How do users define multiple virtual environments?
Some organizations may like to provide users more flexibility in choosing packages they wish to install in a base Python environment. We enable this through a component known as a custom step. The custom step is a GUI component containing a program within SAS Studio that can be inserted as part of a low-code analytics flow (a SAS Studio Flow). In this case, end-users make use of a step to set up a virtual environment, install packages and activate for further use within the session. Note the screenshot below which illustrates how this step is configured. Parameters requested from the user are:
- The name of the virtual environment – for identification purposes
- The location of a desired requirements.txt file, which is used for installing desired packages
- As an optional feature, the user may like to persist the definition of this virtual environment elsewhere on disk, so that the same can be activated later if desired.
In addition, other steps could also be created, which can be used to port solutions. For example, Freeze Requirements outputs a requirements file that can be provided to a colleague (along with other analytical programs) so that they can recreate the same with exactly the same package-level parameters. Finally, a “Spin Down” step deactivates the current virtual environment and Python reverts to the base environment. The base Python environment requires a Python package called virtualenv to be installed, to build virtual environments.
Summing it Up
SAS solutions are a bit like the Sagrada Familia – you know, the church in Barcelona they have been building since the medieval ages? There is always room for an addition, an improvement or a variant. Using current functionality provided by SAS Viya, we create assets that enable multiple purpose-built Python environments, contexts and virtual environments. These assets can be refined and steps are underway to share them openly for this purpose. One active line of engagement is with SAS R&D teams, in the hope that such capabilities may be more readily available (with functional equivalence, if not in exact form) in the future. Some areas where this capability could be improved are as follows:
- Out-of-the-box, predefined methods to define and access virtual environments
- A better (faster & smoother) transition experience when switching between environments
- Similar context-switching functionality in other SAS applications providing access to Python
- For administrators, more automated and easier manifest creation and governance
- Similar approaches for other open-source programming languages in the analytics domain (e.g. R)
To recap, driven by feedback and a strong inclination for continuity from the open-source user community, we examined ways to improve the open-source interaction experience, concentrating on Python users (a similar approach could be adopted for other languages like R). We addressed a need for quick access to purpose-built environments through the use of SAS compute contexts that seamlessly switch profiles and access multiple Python environments in a non-intrusive manner.
We extended this further with a custom step that provides flexibility to end-users without their requiring administrators’ assistance in installing packages. This is a customizable solution that can be adapted to different Python user personas, preferences and project-specific parameters. Users have the flexibility to control their Python environments, which opens new avenues for testing, rapid development and portable solutions. The entire process is easy and enjoyable, does not pose an additional burden upon administrators and allows access through a variety of methods. We’ll continue to pursue improvement areas and open up our example for further improvement.
The authors thank Jody Steadman, Sr. Software Developer @ SAS, for his assistance with the SAS Configurator for Open Source.