SAS Studio Parallel Process Flows

7

parallel2One of the hidden gems of SAS Studio is the ability to run process flows in parallel. This feature really shines when used in a grid environment. Let’s discuss this one step at a time.

First, what is a process flow? When working in the Visual Programmer perspective, you have access to process flows. A process flow is a graphical representation of a process, where each object, be it a SAS program, a SAS Studio task, a query, and so on, is represented by a node. Nodes are connected by links that instruct SAS Studio how to move from one node to the next.

SAS Studio Parallel Process Flows1

Display 1. SAS Studio Process Flow

On the Properties tab of the current process flow, you can set the execution mode of the nodes.  With the default setting, SAS Studio runs the nodes in the order in which they were added to the process flow. If node 2 is dependent on node 1, node 1 must run completely before node 2 will run.

You can change the execution mode to Parallel as shown in Display 2. When this value is set, SAS Studio uses multiple workspace servers to run the nodes concurrently, always enforcing the correct dependencies.

SAS Studio Parallel Process Flows2

Display 2. Setting the Execution Mode to Parallel

When you use this feature in a SAS Grid environment, if the administrator has configured the workspace server sessions to be grid launched, you can achieve the benefits and the performance improvements of multi-machine parallel load balancing, without having to code any SAS/CONNECT statement. It’s a real point-and-click parallel execution engine!

Display 3 shows the process flow presented in Display 1 running in parallel execution mode. The pane is grayed out because it is not possible to interact with it until the execution is complete.

We can see that the List Data node is still running, while the Partition Data node has already finished. Thus, the two Filter Data nodes were able to start in parallel.

SAS Studio Parallel Process Flows3

Display 3. Tasks Running in Parallel in a Process Flow

In this scenario, we would guess three workspace server sessions are concurrently running our code. However, if we monitor what is happening on the back-end hosts, we notice something unexpected. There are actually five workspace server sessions running. Why? As soon as you sign in to SAS Studio, it starts two SAS sessions. These are used only for the default execution mode. If a process flow is run in parallel mode, up to three additional SAS session are started, for a total of five. Once the process flow is finished, the three additional SAS processes terminate, if there is no further activity for 30 seconds, in order to release resources.

An administrator can use a configuration property, webdms.maxParallelWorkspaces, to specify the maximum number of workspaces that can be used when SAS is running in parallel mode. The default value is 3. The maximum value is 8.

I hope you enjoy running multiple tasks concurrently. If you have already started using parallel processing, you might want to check out my earlier blog, How to avoid the pitfalls of parallel jobs!

NOTE: Some of this content comes from my SAS Global Forum 2016 presentation, SESSION ID SAS-6422 Deep Dive with SAS® Studio into SAS® Grid Manager 9.4. If you're attending the conference and would like to learn more on this topic, please plan to attend.

Share

About Author

Edoardo Riva

Principal Technical Architect

Edoardo Riva is a Principal Technical Architect in the Global Enablement and Learning (GEL) Team within SAS R&D's Global Technical Enablement Division.

7 Comments

  1. Very interesting, thanks for sharing. This new features apply only to SAS Studio latest release 3.5. Unfortunately, pictures aren't clickable in my browser's sessions.

  2. Thanks a lot . You are proving me wrong, it does work in 3.4. Very useful feature.:) Waiting for the conversion engine { SEG Flow *.egp } => { SAStudio *.CTL } ...

  3. Nice feature! When running in a grid environment, does it use multiple workspace server sessions on the same grid node or multiple grid nodes? I'm assuming the same but wanted to check. Also, do the sessions share the same WORK library, or are they separate WORK libraries like an RSUBMIT?

    • Edoardo Riva on

      Michelle,
      when running in a grid environment, it is up to the grid software to decide where to start each session, assuming you have configured grid-launched workspace servers. Thus, there is a good chance that the workspace server sessions will end up on different nodes.
      Whether the workspace servers are started on the same node or on different nodes, their WORK libraries are always different. That's why I suggest to read the other post linked at the end.
      In this specific case, when using process flows in the visual programmer perspective, there is actually a "WORK-like" library that is automatically created by SAS Studio and shared across all the workspace servers (assuming you are using a shared filesystem). It is called WEBWORK and can be used to share data across sessions. SAS Studio will take care of deleting it when you log off, just like the regular WORK directory.

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top