Running SAS programs in batch under Unix/Linux

21

Running SAS programs in batch under Unix/LinuxWhile SAS program development is usually done in an interactive SAS environment (SAS Enterprise Guide, SAS Display Manager, SAS Studio, etc.), when it comes to running SAS programs in a production or operations environment, it is routinely done in batch mode.

Related Content: Automating SAS processes using Windows batch files
Related Content: Let SAS write batch scripts for you

Why run SAS programs in batch mode?

First and foremost, this is done for automation, as the batch process does not require human participation at the time of run. It can be scheduled to run (using Operating System scheduler or other scheduling software) while we sleep, at any time of the day or at any time interval between two consecutive runs.

Running SAS programs in batch mode allows streamlining SAS processing by eliminating the possibility of human error, submitting multiple SAS jobs (programs) all at once or in a sequence securing programs and/or data dependencies.

SAS batch processing also takes care of self-documenting, as it automatically generates and stores SAS logs and outputs.

Imagine the following scenario. Every night, a SAS batch process “wakes up” at 3 a.m. and runs an ETL process on a SAS Application server that extracts multiple tables from a database, transforms, combines, and loads them into a SAS datamart; then moves some data tables across the network and loads them into SAS LASR server, so when you are back to work in the morning your SAS Visual Analytics application has all its data refreshed and ready to roll. Of course, the process schedule can be custom-tailored to your particular needs; your batch jobs may run every 15 minutes, once a week, every first Friday of the month – you name it.

What is a batch script file?

To submit a single SAS program in batch mode manually, you could submit an OS command that looks something like the following:

Unix/Linux

sas /sas/code/proj1/job1.sas -log /sas/code/proj1/job1.log

DOS/Windows

"C:\Program Files\SASHome\SASFoundation\9.4\Sas.exe" -SYSIN c:\proj1\job1.sas -NOSPLASH -ICON -LOG c:\proj1\job1.log

However, submitting an OS command manually has too many drawbacks: it’s too much typing, it only submits one SAS program at a time, and most importantly – it is manual, which means it is prone to human error.

Usually, these OS commands are packaged into so called batch files (shell scripts in Unix) that allow for sequential, parallel, as well as conditional execution of multiple OS line commands. They can be run either manually, or automatically – on schedule, or called by other batch scripts.

In a Windows/DOS Operating System, these script files are called batch files and have .bat filename extensions. In Unix-like operating systems, such as Linux, these script files are called shell scripts and have .sh filename extensions.

Since Windows batch files are similar, but slightly different from the Unix (and its open source cousin Linux) shell scripts, in the below examples we are going to use Unix/Linux shell scripts only, in order to avoid any confusion. And we are going to use terms Unix and Linux interchangeably.

Here is the typical content of a Linux shell script file to run a single SAS program:

#!/usr/bin/sh
dtstamp=$(date +%Y.%m.%d_%H.%M.%S)
pgmname="/sas/code/project1/program1.sas"
logname="/sas/code/project1/program1_$dtstamp.log"
/sas/SASHome/SASFoundation/9.4/sas $pgmname -log $logname

Note, that the shell script syntax allows for some basic programming features like current datetime function, formatting, and variables. It also provides some conditional processing similar to “if-then-else” logic. For detailed information on the shell scripting language you may refer to the following BASH shell script tutorial or any other source of many dialects or flavors of the shell scripting (C Shell, Korn Shell, etc.)

Let’s save the above shell script as the following file:
/sas/code/project1/program1.sh

How to submit a SAS program via Unix script

In order to run this shell script we would submit the following Linux command:
/sas/code/project1/program1.sh

Or, if we navigate to the directory first:
cd /sas/code/project1

then we can submit an abbreviated Linux command
./program1.sh
When run, this shell script not only executes a SAS program (program1.sas), but for every run it also creates and saves a uniquely named SAS Log file. You may create the SAS log file in the same directory where the SAS code is stored, as specified in the script shell above, or specify another directory of your choice.

For example, it creates the following SAS log file:
/sas/code/project1/program1_2017.12.06_09.15.20.log

The file name uniqueness is achieved by adding a date/time stamp suffix between the SAS program name and .log file name extension, in this particular case indicating that this SAS log file was created on December 6, 2017, at 09:15:20 (hours:minutes:seconds).

Unix script for submitting multiple SAS programs

Unix scripts may contain not only OS commands, but also other Unix script calls. You can mix-and-match OS commands and other script calls.

After scripts are created for each individual SAS program that you intend to run in a batch, you can easily combine them into a program flow by creating a flow script containing those single program scripts. For example, let’s create a script file /sas/code/project1/flow1.sh with the following contents:

/sas/code/project1/program1.sh
/sas/code/project1/program2.sh
/sas/code/project1/program3.sh

When submitted as

/sas/code/project1/flow1.sh

it will sequentially execute three scripts - program1.sh, program2.sh, and program3.sh, each of which will execute the corresponding SAS program - program1.sas, program2.sas, and program3.sas, and produce three SAS logs - program1.log, program2.log, and program3.log.

If sequential execution of your SAS programs is not required, you can run them independently (in parallel) by modifying your flow1.sh script so the component scripts run in the Unix background mode; just add a space and an ampersand to the end of each command:

/sas/code/project1/program1.sh &
/sas/code/project1/program2.sh &
/sas/code/project1/program3.sh &

Unix script file permissions

In order to be executable, UNIX script files must have certain permissions. If you create the script file and want to execute it yourself only, the file permissions can be as follows:

-rwxr-----, or 740 in octal representation.

This means that you (the Owner of the script file) have Read (r), Write (w) and Execute (x) permission as indicated by the green highlighting; Group owning the script file has only Read (r) permission as indicated by yellow highlighting;  Others have no permissions to the script file at all as indicated by red highlighting.

If you want to give yourself (Owner) and Group execution permissions then your script file permissions can be as:

-rwxr-x---, or 750 in octal representation.

In this case, your group has Read (r) and Execute (x) permissions as highlighted in yellow.

You can assign file permissions using the chmod Unix command.

Note, that in both examples above we do not give Others any permissions at all. Remember that file permissions are a security feature, and you should assign them at the minimum level necessary.

Conditional execution of scripts and SAS programs

Here is an example of a Unix script file that allows running multiple SAS programs and OS commands at different times.

#!/bin/sh

#1 extract data from a database
/sas/code/etl/etl.sh

#2 copy data to the Visual Analytics autoload directory
scp -B userid@sasAPPservername:/sas/data/*.sas7bdat userid@sasVAservername:/sas/config/.../AutoLoad

#3 run weekly, every Monday
dow=$(date +%w)
if [ $dow -eq 1 ]
then
   /sas/code/alerts_generation.sh
fi

#4 run monthly, first Friday of every month
dom=$(date +%d)
if [ $dow -eq 5 -a $dom -le 7 ]
then
   /sas/code/update_history.sh
   /sas/code/update_transactions.sh
fi

In this script, the following logical operators are used: -eq (equal), -le (less or equal), -a (logical and).

As you can see, the script logic takes care of branching to execute different SAS programs when certain timing conditions are met. With such an approach, you would need to schedule only this single script to run at a specified time/interval, say daily at 3 a.m. The script will “wake up” every morning at 3 a.m. and execute its component scripts either unconditionally, or conditionally.

If one of the included programs needs to run at a different, lesser frequency (e.g. every Monday, or monthly on first Friday of every month) the script logic will trigger those executions at the appropriate times.

In the above script example steps #1 and #2 will execute every time (unconditionally) the script runs (daily). Step #1 runs ETL program to extract data from a database, step #2 copies the extracted data across the network from SAS Application server to the SAS LASR Analytic server’s drop zone from where they are automatically loaded (autoloaded) into the LASR.

Step #3 will run conditionally every Monday ( $dow -eq 1). Step #4 will run conditionally every first Friday of a month ($dow -eq 5 -a $dom -le 7).

For more information on how to format date for use in shell scripts please refer to this post.

Do you run your SAS programs in batch?

Please share your batch experiences in the comment section below. I am sure the rest of us will really appreciate it!

Additional Resources

Share

About Author

Leonid Batkhan

Leonid Batkhan is a long-time SAS consultant and blogger. Currently, he is a Lead Applications Developer at F.N.B. Corporation. He holds a Ph.D. in Computer Science and Automatic Control Systems and has been a SAS user for more than 25 years. From 1995 to 2021 he worked as a Data Management and Business Intelligence consultant at SAS Institute. During his career, Leonid has successfully implemented dozens of SAS applications and projects in various industries. All posts by Leonid Batkhan >>>

21 Comments

  1. Bhaskar Sundaram on

    Hi @leonidbatkhan

    I am totally new to SAS and in my current work the SAS program run in Mainframe. The requirement is to check if we can run them in Linux/Unix machines.

    Based on your detailed information, I am pretty sure that we can definitely run those SAS programs in Linux/Unix machine. But currently my Linux/Unix machine doesn't have SAS installed. Could you please explain what SAS version I need to install in my Linux machine or does it have to align with the version on which SAS program was written?

    Thank you,
    Bhaskar

  2. Nice article.

    Some while back, I put together a generic script to run batch files in VBScript. Similar stuff, generally, but I also made use of the text box to provide some user interaction with the script.

  3. Robert Allison

    One other 'trick' I found useful for running SAS jobs in batch on Unix was using a Makefile, and setting it up such that the sas programs would be re-run any time the data files (csv) had been updated more recently than the sas output (in my case, the output was sas datasets, and graphs). That way when I got a few new csv files, all I had to do was type 'make', and it would figure out the dependencies, and re-run the sas jobs that needed to be run.

    Also, I almost always run my Windows sas jobs in batch, from a Windows command prompt window, and view my output (usually graphs) in a web browser. I make this easy by adding the location of the sas.exe to my search path, so I don't have to type the full path every time - I can just type "sas foo.sas" at the Windows command prompt.

  4. Once upon a time (5.17) running SAS would mean using batch scripts.
    As running programs while you are not there is an important aspect of automating all kind of things I have always been wondered the focus on interactive usage as the only holy grail.

    SAS is behaving different with a lot of small attention points for interactive or batch usage.
    The only real reason is that running in batch is requiring thinking on all kind of error or warining conditions as nobody is there to decide what to do.

    Using the SYSCC SAS macro for informing the OS script of the SAS code condition is a important step to do.
    Remembering the CC stands for Condition Code of mainframe job step processing. In Unix (linux) it is a value between 0-255 The Sh script can validate is by using $?
    The common convention is value: 0 all is ok, not zero than something is badly wrong.

  5. Hi,

    A very useful post : this topic must always be adressed in practice; thanks for sharing the guidelines.

    I strongly recommend running the launcher script from a non-interactive shell session (aka 'headless' session) ,
    namely using 'nohup' command or any scheduler like 'at' or 'crontab' etc. Then you don't need any terminal window
    to remain opened until the batch completes eventually.

    • Thank you for great post, learning to run SAS in batch mode is a great problem solver in day to day activities!
      Small extension to your post - for automated SAS job scheduling with dependencies we use crontab + Python. It works quite well for us because Python pandas can read SAS data sets, thus we can program some conditions based on data we find in a dataset. Simple example of it could be - check if required data is uploaded to the dataset, if yes - launch SAS reporting job. If not - wait 5 minutes and repeat.

  6. Chris Hemedinger
    Chris Hemedinger on

    Leonid, this is a very good introduction to creating batch SAS jobs!

    I have a bevy of SAS batch scripts that run each morning to gather and report on these very blogs, SAS online communities, and the sassoftware GitHub presence. I use FILENAME EMAIL to send e-mail status reports to myself and to various other stakeholders so that the key results are communicated to the people who need to know. I rely on Unix cron jobs for the scheduling (not putting the logic in the script) to run the jobs at various times during the early morning and only on Monday-Friday.

    My batch jobs also create information and push it to the world. For example, all SAS hot fix announcements are automatically pushed to this communities announcement board, using the data provided by SAS Tech Support for all available hot fixes.

    • How can i run a lot of .egp projects in vbscript in sequence, with return code condition (... the second egp must run only if the 1st egp ran successfully...) and in parallel (2 call or to prj.run() method, are executed sequentially or simultaneously ?)
      thks

      • Chris Hemedinger
        Chris Hemedinger on

        EG projects (as you might know) don't have a return code per se. The more reliable method of checking a result would be to use VBScript to export/parse something from the SAS log, or to perhaps check for the existence of a data set that you expect to be created as a result of the project. The conditional logic for whether to open/run the next project in sequence would then all be in VBScript.

        A VBScript would run each project in sequence, not in parallel. The only way to run in parallel would be to launch multiple VBScripts simultaneously -- not a good idea of the projects have interdependencies, as it could create a race condition and get stuck.

        See more about EG Automation in this communities article.

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top