In automated production (or business operations) environments, we often run SAS job flows in batch mode and on schedule. SAS job flow is a collection of several inter-dependent SAS programs executed as a single process.
In my earlier posts, Running SAS programs in batch under Unix/Linux and Let SAS write batch scripts for you, I described how you can run SAS programs in batch mode by creating UNIX/Linux scripts that in turn incorporate other scripts invocations.
In this scenario you can run multiple SAS programs sequentially or in parallel, all while having a single root script kicked off on schedule. The whole SAS processing flow runs like a chain reaction.
Why and when to stop SAS batch flow process
However, sometimes we need to automatically stop and terminate that chain job flow execution if certain criteria are met (or not met) in a program of that process flow.
See also: How to conditionally stop SAS code execution and gracefully terminate SAS session
Let’s say our first job in a batch flow is a data preparation step (ETL) where we extract data tables from a database and prepare them for further processing. The rest of the batch process is dependent on successful completion of that critical first job. The process is kicked off at 3:00 a.m. daily, however, sometimes we run into a situation when the database connection is unavailable, or the database itself is not finished refreshing, or something else happens resulting in the ETL program completing with ERRORs.
This failure means that our data has not updated properly and there is no reason to continue running the remainder of the job flow process as it might lead to undesired or even disastrous consequences. In this situation we want to automatically terminate the flow execution and send an e-mail notification to the process owners and/or SAS administrators informing them about the mishap.
How to stop SAS batch flow process in UNIX/Linux
Suppose, we run the following main.sh script on UNIX/Linux:
#!/bin/sh #1 extract data from a database /sas/code/etl/etl.sh #2 run the rest of processing flow /sas/code/processing/tail.sh |
The etl.sh script runs the SAS ETL process as follows:
#!/bin/sh dtstamp=$(date +%Y.%m.%d_%H.%M.%S) pgmname="/sas/code/etl/etl.sas" logname="/sas/code/etl/etl_$dtstamp.log" /sas/SASHome/SASFoundation/9.4/sas $pgmname -log $logname |
We want to run tail.sh shell script (which itself runs multiple other scripts) only if etl.sas program completes successfully, that is if SAS ETL process etl.sas that is run by etl.sh completes with no ERRORs or WARNINGs. Otherwise, we want to terminate the main.sh script and do not run the rest of the processing flow.
To do this, we re-write our main.sh script as:
#!/bin/sh #1 extract data from a database /sas/code/etl/etl.sh exitcode=$? echo "Status=$exitcode (0=SUCCESS,1=WARNING,2=ERROR)" if [ $exitcode -eq 0 ] then #2 run the rest of processing flow /sas/code/processing/tail.sh fi |
In this code, we use a special shell script variable ($? for the Bourne and Korn shells, $STATUS for the C shell) to capture the exit status code of the previously executed OS command, /sas/code/etl/etl.sh:
exitcode=$?
Then the optional echo command just prints the captured value of that status for our information.
Every UNIX/Linux command executed by the shell script or user has an exit status represented by an integer number in the range of 0-255. The exit code of 0 means the command executed successfully without any errors; a non-zero value means the command was a failure.
SAS System plays nicely with the UNIX/Linux Operating System. SAS’ numeric automatic macro variable SYSCC contains the current condition code that SAS returns to your operating environment (the operating environment condition code).
According to the SAS documentation Determining the Completion Status of a SAS Job in UNIX Environments, a SAS job returns the exit status code for its completion the same way the shell code does it - in the special shell script variable ($? for the Bourne and Korn shells, and $STATUS for the C shell.) A value of 0 indicates successful termination. For additional flexibility, SAS’ ABORT statement with an optional integer argument allows you to specify a custom exit status code.
The following table summarizes the values of the SAS exit status code:
Condition | Exit Status Code |
---|---|
All steps terminated normally | 0 |
SAS issued WARNINGs | 1 |
SAS issued ERRORs | 2 |
User issued ABORT statement | 3 |
User issued ABORT RETURN statement | 4 |
User issued ABORT ABEND statement | 5 |
SAS could not initialize because of a severe error | 6 |
User issued ABORT RETURN - n statement | n |
User issued ABORT ABEND - n statement | n |
Note: Exit status codes of 0–6 and greater than 977 are reserved for use by SAS.
Since our etl.sh script executes SAS code etl.sas, the exit status code is passed by the SAS System to etl.sh and consequently to our main.sh shell script.
Then, in the main.sh script we check if that exit code equals to 0 and then and only then run the remaining flow by executing the tail.sh shell script. Otherwise, we skip tail.sh and exit from the main.sh script reaching its end.
Alternatively, the main.sh script can be implemented with an explicit exit as follows:
#!/bin/sh #1 extract data from a database /sas/code/etl/etl.sh exitcode=$? echo "Status=$exitcode (0=SUCCESS,1=WARNING,2=ERROR)" if [ $exitcode -ne 0 ] then exit fi #2 run the rest of processing flow /sas/code/processing/tail.sh |
In this shell script code example, we check the exit return code value, and if it is NOT equal to 0, then we explicitly terminate the main.sh shell script using exit command which gets us out of the script immediately without executing the subsequent commands. In this case, our #2 command invoking tail.sh script never gets executed that effectively stops the batch flow process.
If you also need to automatically send an e-mail notification to the designated people about the failed batch flow process, you can do it in a separate SAS job that runs right before exit command. Then the if-statement will look something like this:
if [ $exitcode -ne 0 ] then # send an email and exit /sas/code/etl/email_etl_failure.sh exit fi |
That is immediately after the email is sent, the shell script and the whole batch flow process gets terminated by the exit command; no shell script commands beyond that if-statement will be executed.
A word of caution
Be extra careful if you use the special script variable $? directly in a script's logical expression, without assigning it to an interim variable. For example, you could use the following script command sequence:
/sas/code/etl/etl.sh if [ $? -ne 0 ] . . . |
However, let’s say you insert another script command between them, for example:
/sas/code/etl/etl.sh echo "Status=$? (0=SUCCESS,1=WARNING,2=ERROR)" if [ $? -ne 0 ] . . . |
Then the $? variable in the if [ $? -ne 0 ] statement will have the exit code value of the previous echo command, not the /stas/code/etl/etl.sh command as you might imply.
Hence, I suggest capturing the $? value in an interim variable (e.g. exitcode=$?) right after the command, exit code of which you are going to inspect, and then reference that interim variable (as $exitcode) in your subsequent script statements. That will save you from trouble of inadvertently referring to a wrong exit code when you insert some additional commands during your script development.
Your thoughts
What do you think about this approach? Did you find this blog post useful? Did you ever need to terminate your batch job flow? How did you go about it? Please share with us.
See also: How to conditionally stop SAS code execution and gracefully terminate SAS session
16 Comments
This is also great. I'm going to try it ASAP.
Thank you, John. Would love to hear how it worked out for you.
Hi Leonid, thank you for this article! I found very helpful and have a question.
For the scenario where status code 1 is an acceptable value, how can we conditionally set a warning return status in SAS without aborting? For example, in my data step I have the following:
Then in my script I check the status code:
Yes, Samantha, you can conditionally generate WARNING as well as ERROR messages. However, if you don't want to terminate your job in case of a WARNING (exit code = 1), your script part should be
Also see How to conditionally stop SAS code execution and gracefully terminate SAS session which you might find helpful.
What does this mean at the top of your shell script, "#!/bin/sh" ?
There is a variety of different shell script interpreters on Unix-like Operating Systems. This first line of the script (it is called "sha-bang line") tells the operating system which command interpreter should be used to interpret the script.
In particular, #!/bin/sh executes the script using the Bourne shell or a compatible shell, with path /bin/sh.
For more information, see About /bin/sh or (#!/bin/bash ) What exactly is this ?.
Great article, Leonid!
Thank you, Kirk, it's very nice of you!
A hodgepodge of comments...
1) For a really full featured, relatively inexpensive scheduler, I highly recommend JAMS: https://www.jamsscheduler.com/. No association with JAMS other than a very happy customer. They have a free trial, and a rich workflow engine that allows setting up parallel and serial processing, email notification, and restart from point of failure.
2) I once wrote an execution process (not really a scheduler) which used Excel --> SAS dataset, which would run SAS programs in parallel when there were no dependencies, and serially when there were upstream dependencies. It used call system and waitfor commands in SAS. So, there was one main SAS program driving the execution of the rest of the ETL, and it would halt downstream processing if any of the parallel "group" of programs had an error, or if an upstream serial program had an error. I can try to dig up the code should this be useful.
3) Both Powershell (which now runs on Linux) and Python have modules that allow you to run external programs in parallel (including thread limiting) and halting downstream processing if there is an upstream error. Again, I can find a Powershell example if that would be useful. I don't have the Python code, I've only read about it. Would be a fun exercise though 🙂
Thank you, Scott, for your comments. It's a great addition to the topic of this post. If you find your implementation code, you are welcome to share it here, I am sure our readers will appreciate it.
Great info, as always, Leonid. One question - In Linux, is there an option to use a gt, ge, lt, or le instead of an eq or ne? For instance, say you didn't care about warnings and wanted the ETL to keep running in the event of seeing a 0 or 1 status code - it that possible?
Thank you, Kerri. Yes, sure you can use any of them (-eq, -ne, -gt, -ge, -lt, -le) - see these Bash scripting binary comparison operators. You also raised an excellent question expanding the scenario to 0 and 1 as acceptable values. It depends, whether warnings are acceptable or not. Ideally, we should strive to eliminate all warnings by proper coding, however sometimes warnings have nothing to do with the quality of our code, for example you can get a warning that your SAS software is expiring soon and you need to update your license.
I am missing the reference to: &SYSCC. Coding an abort is not always applicable (batch/interactive) . At the end of the program a check can be made. &syscc can be modified in your own sas program. An other one is that trying to analyse logfiles to solve the different behavior interactie / batch easily gets overcomplicated and introduces possible weird effects in scripting dependencies.
https://documentation.sas.com/?docsetId=mcrolref&docsetTarget=p11nt7mv7k9hl4n1x9zwkralgq1b.htm&docsetVersion=9.4&locale=us
Great suggestion, Ja, thank you. I added that reference in the blog post. However, it might be a mute point for the scenario of this blog post as we are controlling the flow of execution between different SAS programs comprising the job flow, not the flow of execution within a single SAS program which might be controlled by the SYSCC macro variable.
Very useful. Recently when I've needed to terminate a job (batch or interactive), I've been playing with the ABORT statement. There are a bunch of options that allow you to control how it works. So far, I've been happy with abort cancel. It also forces a non-zero exit status, so would work nicely with shell script approach.
Thank you, Quentin, for sharing your experience. In addition to what ABORT CANCEL does, ABORT CANCEL RETURN n allows you to pass n-value to the batch script which can further control the flow execution scenario.