You'll notice several changes in SAS Grid Manager with the release of SAS 9.4M6.
For the first time, you can get a grid solution entirely written by SAS, with no dependence on any external or third-party grid provider.
This post gives a brief architectural description of the new SAS grid provider, including all major components and their role. The “traditional” SAS Grid Manager for Platform has seen some architectural changes too; they are detailed at the bottom.
A new kid in town
SAS Grid Manager is a complex offering, composed of different layers of software. The following picture shows a very simple, high-level view. SAS Infrastructure here represents the SAS Platform, for example the SAS Metadata Server, SAS Middle Tier, etc. They service the execution of computing tasks, whether a batch process, a SAS Workspace server, and so on. In a grid environment these computing tasks are distributed on multiple hosts, and orchestrated/managed/coordinated by a layer of software that we can generically call Grid Infrastructure or Grid Middleware. That’s basically a set of lower-level components that sit between computing processes and the operating system.
Since its initial design more than a decade ago, the SAS Grid Manager offering has always been able to leverage different grid infrastructure providers, thanks to an abstraction layer that makes them transparent to end-user client software.
Our strategic grid middleware has been, since the beginning, Platform Suite for SAS, provided by Platform Computing (now part of IBM).
A few years ago, with the release of SAS 9.4M3, SAS started delivering an additional grid provider, SAS Grid Manager for Hadoop, tailored to grid environments co-located with Hadoop.
The latest version, SAS 9.4M6, opens up choices with the introduction of a new, totally SAS-developed grid provider. What’s its name? Well, since it’s SAS’s grid provider, we use the simplest one: SAS Grid Manager. To avoid confusion, what we used to call SAS Grid Manager has been renamed SAS Grid Manager for Platform.
The reasons for a choice
The SAS-developed provider for SAS Grid Manager:
• Is streamlined specifically for SAS workloads.
• Is easier to install (simply use the SAS Deployment Wizard (SDW) and administer.
• Extends workload management and scheduling capabilities into other technologies, such as
o Third-party compute workloads like open source.
o SAS Viya (in a future release).
• Reduces dependence of SAS Grid Manager on third party technologies.
So what are the new components?
The SAS-developed provider for SAS Grid Manager includes:
• SAS Workload Orchestrator
• SAS Job Flow Scheduler
• SAS Workload Orchestrator Web Interface
• SAS Workload Orchestrator Administration Utility
These are the new components, delivered together with others also available in previous releases and with other providers, such as the Grid Manager Thin Client Utility (a.k.a. SASGSUB), the SAS Grid Manager Agent Plug-in, etc. Let’s see these new components in more detail.
SAS Workload Orchestrator
The SAS Workload Orchestrator is your grid controller – just like Platform’s LSF is with SAS Grid Manager, it:
• Dispatches jobs.
• Monitors hosts and spreads the load.
• Is installed and runs on all machines in the cluster (but is not required on dedicated Metadata Server or Middle-tier hosts).
A notable difference, when compared to LSF, is that the SAS Workload Orchestrator is a single daemon, with its configuration stored in a single text file in json format.
Redeveloped for modern workloads, the new grid provider can schedule more types of jobs, beyond just SAS jobs. In fact, you can use it to schedule ANY job, including open source code running in Python, R or any other language.
SAS Job Flow Scheduler
SAS Job Flow Scheduler is the flow scheduler for the grid (just as Platform Process Manager is with SAS Grid Manager for Platform):
• It passes commands to the SAS Workload Orchestrator at certain times or events.
• Flows can be used to run many tasks in parallel on the grid.
• Flows can also be used to determine the sequence of events for multiple related jobs.
• It only determines when jobs are submitted to the grid, but they may not run immediately if the right conditions are not met (hosts too busy, closed, etc.)
The SAS Job Flow Scheduler provides flow orchestration of batch jobs. It uses operating system services to trigger the flow to handle impersonation of the user when it is time for the flow to start execution.
A flow can be built using the SAS Management Console or other SAS products such as SAS Data Integration Studio.
SAS Job Flow Scheduler includes the ability to run a flow immediately (a.k.a. “Run Now”), or to schedule the flow for some future time/recurrence.
SAS Job Flow Scheduler consists of different components that cooperate to execute flows:
• SASJFS service is the main running service that handles the requests to schedule a flow. It runs on the middle tier as a dedicated thread in the Web Infrastructure Platform, deployed inside sasserver1. It uses services provided by the data store (SAS Content Server) and Metadata Server to read/write the configuration options of the scheduler, the content of the scheduled flows and the history records of executed flows.
• Launcher acts as a gateway between SASJFS and OS Trigger. It is a daemon that accepts HTTP connections using basic authentication (username/password) to start the OS Trigger program as the scheduled user. This avoids the requirement to save end-users’ passwords in the grid provider, for both Windows and Unix.
• OS Trigger is a stand-alone Java program that uses the services of the operating system to handle the triggering of the scheduled flow by providing a call to the Job Flow Orchestrator. On Windows, it uses the Windows Task Scheduler; on UNIX, it uses cron or crontab.
• Job Flow Orchestrator is a stand-alone program that manages the flow orchestration. It is invoked by the OS scheduler (as configured by the OS Trigger) with the id of the flow to execute, then it connects to the SASJFS service to read the flow information, the job execution configuration and the credentials to connect to the grid. With that information, it sends jobs for execution to the SAS Workload Orchestrator. Finally, it is responsible for providing the history record for the flow back to SASJFS service.
SAS Grid Manager provides additional components to administer the SAS Workload Orchestrator:
• SAS Workload Orchestrator Web Interface
• SAS Workload Orchestrator Administration Utility
Both can monitor jobs, queues, hosts, services, and logs, and configure hosts, queues, services, user groups, and user resources.
The SAS Workload Orchestrator Web Interface is a web application hosted by the SAS Workload Orchestrator process on the grid master host; it can be proxied by the SAS Web Server to always point to the current master in case of failover.
The SAS Workload Orchestrator Administration Utility is an administration command-line interface; it has a similar syntax to SAS Viya CLIs and is located in the directory /Lev1/Applications/GridAdminUtility. A sample invocation to list all running jobs is:
sas-grid-cli show-jobs --state RUNNING
What has not changed
Describing what has not changed with the new grid provider is an easy task: everything else.
Obviously, this is a very generic statement, so let’s call out a few noteworthy items that have not changed:
• User experience is unchanged. SAS programming interfaces to grid have not changed, apart from the lower-level libraries to connect to the new provider. As such, you still have the traditional SAS grid control server, SAS grid nodes, SAS thin client (aka SASGSUB) and the full SAS client (SAS Display Manager). Users can submit jobs or start grid-launched sessions from SAS Enterprise Guide, SAS Studio, SAS Enterprise Miner, etc.
• A directory shared among all grid hosts is still required to share the grid configuration files.
• A high-performance, clustered file system for the SASWORK area and for data libraries is mandatory to guarantee satisfactory performance.
What about SAS Grid Manager for Platform?
The traditional grid provider, now rebranded as SAS Grid Manager for Platform, has seen some changes as well with SAS 9.4M6:
• The existing management interface, SAS Grid Manager for Platform Module for SAS Environment Manager, has been completely re-designed. The user interface has completely changed, although the functions provided remain the same.
• Grid Management Services (GMS) is not updated to work with the latest release of LSF. Therefore, the SAS Grid Manager plug-in for SAS Management Console is no longer supported. However, the plug-in is included with SAS 9.4M6 if you want to upgrade to SAS 9.4M6 without also upgrading Platform Suite for SAS.
You can find more comprehensive information in these doc pages:
• What’s New in SAS Grid Manager 9.4
• Grid Computing for SAS Using SAS Grid Manager (Part 2) section of Grid Computing in SAS 9.4
Does SAS Grid Manager also contain similar tool like Flow Manager where users could monitor job statuses, history and cancel or rerun jobs if necessary?
Rain, currently the new scheduler included with SAS Grid Manager does not have any administrative client.
The server stores jobs staus and history, but there is no interface to get to it.
Currently we have SAS 9.4 M3 with Platform suite for SAS LSF 9.1.3 running. In case, if we would like to upgrade SAS 9.4 M6 with SAS Grid Manager what is the best method to do it.
I suggest you get in touch with your SAS account rep to discuss what are your main motivations for the change. You cannot switch from one grid provider (LSF) to another (SAS). You would have to install a new, different environment, and evaluate if/what content to migrate. Maybe your best option is to upgrade to M6 while staying on LSF.
Refer to upgrade question: we are SAS 9.4M5 and planning to upgrade SAS9.4 M6. does SAS Grid Manger for SAS platform will be installed or old LSF platform will be retained?
could you please clarify? thank you
When you upgrade an existing SAS Grid to 9.4M6 from a previous release, it will maintain the existing technology, i.e. Platform Suite for SAS (including LSF and the Process Manager scheduler)
Indeed great news Eduardo! One less dependency on a third party product. But also a challenge. We depend heavily on LSF (Platform) for scheduling and grid management. At the same time we also depend on a proper working of Kerberos (Active Directory) throughout. Mainly for downstream authentication to Teradata, but also other data sources use it. A grid launched workspace server must be accompanied by a valid ticket. And when running scheduled flows it is important to keep the tickets valid even beyond the maximum renewal time because the owner may not be around for a while. LSF does that in several ways. Does the SAS Job Flow Scheduler also handle Kerberos in all or some of these scenarios?
PS: The name raises one more issue: it is not very distincive. Its generic name makes it very difficult to compose a Google search that will exclude results for any of the other scheduling servers.
I'm glad to hear that, by removing external dependencies, we went down a path that resonates with our customers!
Kerberos integration is fully supported, end to end, when submitting jobs to SAS Workload Orchestrator. After proper configuration:
- end users can submit jobs using Kerberos credentials.
- SAS Workload Orchestrator keeps Kerberos credentials renewed while pending and while running.
- SAS Workload Orchestrator starts jobs (including SAS workspace servers) using Kerberos credentials so that spawned jobs have these Kerberos credentials available for downstream authentication.
For the 9.4M6 version, Kerberos authentication is not available when scheduling using the SAS Job Flow Scheduler. It is on the development team radar; you can expect it to be available in a future release.
But for having the naming consistent, it should be called "SAS Grid Manager for SAS"...
Also, I tried to evaluate SAS Job Flow Scheduler by reading the online doc, but I find it unfortenately quite scarce.
Is there any other source of information of the exact capabilities, and examples how to build and execute flows?
Linus, great point on the name! I’m not an expert, but know there’s lots to consider when naming a product.
Regarding scheduling, have you read the "Scheduling in SAS® 9.4" guide?
It has a specific section for Setting Up Scheduling Using SAS Job Flow Scheduler Scheduling.
An important point to remember is that, as I write above, "User experience is unchanged". As such, to build and execute flows, you follow the same steps as you always did, for example as documented in the guide in the sections Scheduling Jobs Using Schedule Manager or Scheduling Jobs from SAS Data Integration Studio.